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Proteins having glvcosvltransferase activity 

The invention relates to proteins having giycosyltransferase activity and to a recombinant 
process for the production of proteins having giycosyltransferase activity. 

Glycosyltransferases transfer sugar residues from an activated donor substrate, usually a 
nucleotide sugar, to a specific acceptor sugar thus forming a glycosidic linkage. Based on 
the type of sugar transferred, these enzymes are grouped into families, e.g. galactosyl- 
transferases, sialyltransferases and fucosyltransferases. Being resident membrane proteins 
primarily located in the Golgi apparatus, the glycosyltransferases share a common domain 
structure consisting of a short amino-terminal cytoplasmic tail, a signal-anchor domain, 
and an extended stem region which is followed by a large carboxy-terminal catalytic 
domain. The signal-anchor or membrane domain acts as both uncleavable signal peptide 
and as membrane spanning region and orients the catalytic domain of the 
giycosyltransferase within the lumen of the Golgi apparatus. The luntinal stem or spacer 
region is supposed to serve as a flexible tether, allowing the catalytic domain to 
glycosylate carbohydrate groups of membrane-bound and soluble proteins of the secretory 
pathway enroute through die Golgi apparams. Furthermore, the stem portion was 
discovered to function as retention signal to keep the enzyme bound to the Golgi 
membrane (PCT Application No. 91/06635). Soluble forms of glycosyltransferases are 
found in milk, serum and other body fluids. These soluble glycosyltransferases are 
supposed to result from proteolytic release from the corresponding membrane-bound 
forms of the enzymes by endogenous proteases. 

Glycosyltransferases are valuable tools for the synthesis or modification of glycoproteins, 
glycolipids and oligosaccharides. Enzymatic synthesis of carbohydrate structures has the 
advantage of high stereo- and regioselectivity. In contrast to chemical methods the 
time-consuming introduction of protective groups is superfluous. However, enzymatic 
syntiiesis of carbohydrate strucmres has been a problem because glycosyltransferases are 
not readily available. Therefore, production using recombinant DNA technology has been 
woriced on. For example, galactosyltransferases have been expressed in E. coli 
(PCT 90/07000) and Chinese hamster ovary (CHO) ceils (Smith, D.F. et al. (1990) J. Biol. 
Chem, 265, 6225-34), sialyltransferases have been expressed in CHO cells (Lee, E.U. 
(1990) Diss, Abstr. InLBJO, 3453^) and COS-1 cells (Paulson, LC. et al. (1988) J. Cell. 
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Biol. 107, lOA), and fucosyltransferases have been produced in COS-1 cells (Goelz, S.E, 
et al. (1990) Cell 63, 1349-1356; Larsen R.D. et al. (1990) Proc. NaU. Acad, ScL USA 87. 
6674^678) and CHO cells (Potvin, B. (1990) J. BioL Chem. 265, 1615-1622). Recently, 
Paulson et al, have disclosed a method for producing soluble glycosyltransferases (U.S. 
Patent No. 5,032,519). However, there still is a need for proteins having favorabie 
glycosylating properties and for advantageous methods for producing such proteins. 

It is an object of the present invention to provide novel proteins having glycosyltransferase 
activity, recombinant DNA molecules encoding proteins having glycosyltransferase 
activity, hybrid vectors comprising such recombinant DNA molecules, transformed hosts 
suitable for the multiplication and/or expression of the recombinant DNA molecules, and 
processes for the preparation of the proteins, DNA molecules and hosts. 

The present invention concerns a protein having glycosyltransferase activity and 
comprising identical or different catalytically active domains of glycosyltransferases, e.g. 
hybrid proteins. 

Preferred is a protein of the invention which comprises two identical or two different 
catalytically active domains of glycosyltransferases. 

Particulaiiy preferred is such a protein exhibiting two different glycosyltransferase 
activities, Le. a protein capable of transferring two different sugar residues. 

Besides the catalytically active domains a protein of the invention may comprise 
additional amino acid sequences, particularly amino acid sequences of the respective 
glycosyltransferases. 

The invention also concerns a hybrid polypeptide chain, i.e. a hybrid protein, comprising a 
membrane-bound or soluble glycosyluansferase linked to a soluble glycosyltransferase. 
For example, such a hybrid protein comprises a membrane-bound glycosyltransferase 
linked to a soluble glycosyltransferase in N-to C-teraiinal order. 

A glycosyltransferase is a protein exhibiting glycosyltransferase activity, i.e. transfeiring a 
particular sugar residue from a donor molecule to an acceptor molecule. Examples are 
N-acetylglucosaminyltransferases,N-acetylgalactosaminyltransf erases, 
raannosyltransiferases, fucosyltransferases, galactosyltransferases and sialyltransferases. 
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Preferably, the glycosyltransferase is of mammalian, eg. bovine, murine, rat or, 
particularly, human origin. 

Preferred are hybrid proteins exhibiting galactosyl- and sialyitransferase activity. 

A membrane-bound glycosyltransferase is an enzyme which cannot be secreted by the cell 
it is produced by, e.g. a full-length enzyme. Examples of membrane-bound 
glycosyltransferases are the following galactosyltransferases: UDP-Galactose: 
p-galactoside a(l-3)-galactosyitransferase (EC 2.4.1.151) which uses galactose as 
acceptor substrate forming an a(l-3)-linkage and UDP-Galactose: p-N-acetylglucosamine 
p(l-4)-galactosyltransferase (EC 2,4.1^) which transfers galactose to 
N-acetylglucosamine (GlcNAc) forming a P(l-4)-Iinkage. In the presence of 
a-lactalbumin, said p(l-4)-galactosyltransferase also accepts glucose as an acceptor 
substrate, thus catalysing the synthesis of lactose. An example of a membrane-bound 
sialyitransferase is the CMP-NeuAc: p-galactoside a(2-6)-sialyltransferase (EC 14.99.1) 
which forms the NeuAc-a(2-6)Gal-p(l-4)GlcNAc-sequence common to many N-linked 
carbohydrate groups. 

A soluble glycosyltransferase is secxetable by the host cell and is derivable from an 
N-terminaUy truncated full-length (i.e. a membrane-bound) glycosyltransferase naturally 
located in the Golgi apparatus. Such a soluble glycosyltransferase differs from the 
corresponding full-length enzyme by lack of the cytoplasmic tail, the signal anchor and, 
optionally, part or whole of the stem region. An example of soluble glycosyltransferases 
are galactosyltransferases differing from the protein with the amino acid sequence 
depicted in SEQ ID NO. 1 in dxat they lack an NHj-terminal peptide comprising at least 
41 amino acids. A soluble sialyitransferase is e.g. a sialyitransferase missing an 
NH^-terminal peptide consisting of 26 to 61 amino acids as compared to the full length 
form depicted in SEQ ID No. 3. 

As used hereinbefore and hereinafter the term "glycosyltransferase" is intended to include 
variants with the provision that these variants are enzymatically active. Prefened are 
variants of human origin. 

For example, a variant is a naturally occurring variant of a glycosyltransferase found 
within a particular species, e.g. a variant of a galactosyl transferase which differs from the 
enzyme having the amino acid sequence with the SEQ ID NO. 1 in that it lacks serine in 
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position 1 1 and has the amino acids valine and tyrosine instead of alanine and leucine in 
positions 31 and 32, respectively. Such a variant may be encoded by a related gene of the 
same gene family or by an allelic variant of a particular gene. The term "variant" also 
embraces a modified glycosyltransferase, e,g. a glycosyltransferase produced from a DNA 
which has been subjected to in vitro mutagenesis, with the provision that the protein 
encoded by said DNA has the enzymatic activity of the authentic glycosyltransferase. 
Such modifications may consist in an addition, exchange and/or deletion of one or more 
amino acids, the latter resulting in shortened variants. An example of a shortened 
membrane-bound, catalytically active variant is the galactosyltransferase designated 
GT(x.395) consisting of amino acids 1 to 396 of the amino acid sequence depicted in S£Q 
ID No. 1. 

Preferred hybrid proteins comprise a membrane-bound or soluble glycosyltransferase 
linked 10 a soluble glycosyltransferase molecule, or a variant thereof, via a suitable linker 
consisting of genetically encoded amino acids. A suitable linker is a molecule which does 
not impair the favorable properties of the hybrid protein of the invention. The linker 
connects the C-tenninal amino add of one glycosyltransferase molecule with the 
N-terminal anuno acid of the another glycosyltransferase molecule. For example, the 
linker is a peptide consisting of about 1 to about 20, e.g. of about 8 amino acids. In a 
prefened embodiment the linker, also referred to as adaptor, does not contain the amino 
acid cysteine. Particularly preferred is a peptide linker having the sequence 
Arg-Ala-Arg-ne-Arg-Arg-Pro-Ala or Arg-Ala-Gly-Ile-Arg-Arg-Pro-Ala. 

Preferred is a hybrid protein consisting of a galactosyltransferase linked to a 
sialyltransferase via a suitable peptide linker. 

Particularly preferred is a hybrid protein consisting of a membrane-bound 
galactosyltransferase the C-tenninal amino acid of which is linked to the N-terminal 
amino acid of a soluble sialyltransferase via a suitable peptide linker, e.g. a hybrid protein 
having the amino acid sequence set forth in SEQ ID NO. 6 or in SEQ ID NO. 8. 

The hybrid protein according to the invention can be prepared by recombinant DNA 
techniques comprising culturing a suitable transformed yeast strain under conditions 
which allow the expression of the DNA encoding said hybrid protein. Subsequently, the 
enzymatic activity may be recovered. 
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In a preferred embodiment, the desired compounds are manufactured in a process 
comprising 

a) providing an expression vector comprising an expression cassette containing a DNA 
sequence coding for a hybrid protein, 

b) transferring the expression vector into a suitable yeast strain, 

c) cultunng the transformed yeast strain under conditions which allow expression of the 
hybrid protein, and 

d) recovering the enzymatic activity. 

The steps involved in the preparation of the hybrid proteins by means of recombinant 
techniques will be discussed in more detail hereinbelow. 

The invention fiirther relates to a recombinant DNA molecule encoding a hybrid protein of 
the invention. Preferred are DNA molecules coding for the preferred hybrid proteins. 

The nucleotide sequence encoding a particular glycosyltransferase is known from the 
literature or can be deduced from the amino acid sequence of the protein according to 
conventional rules. Starting from the nucleotide sequences encoding the desired 
glycosyltransferase activities, a DNA molecule encoding the desired hybrid protein can be 
deduced and constructed according to methods weU known in the an including, but not 
limited to, the use of polymerase chain reaction (PGR) technology, DNA restriction 
enzymes, synthetic oligonucleotides, DNA ligases and DNA amplification techruques. 
Alternatively, the nucleotide sequence encoding the hybrid protein of the invention may 
be synthesized by chemical methods known in the an or by combining chemical with 
recombinant methods. 

The DNA coding for a particular glycosyltransferase may be obtained from cell sources by 
conventional methods, e.g. by making use of cDNA technology, from vectors in the art or 
by chemical synthesis of the DNA. 

More specifically, DNA encoding a membrane-bound glycosyltransferase can be prepared 
by methods known in the an and includes genomic DNA, e,g. DNA isolated from a 
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mammalian genomic DNA library, e.g. from rat, murine, bovine or human ceils. 11" 
necessary, the introns occurring in genomic DNA encoding the enzyme are deleted. 
Furthermore, DNA encoding a membrane-bound glycosyliransferase comprises cDNA 
which can be isolated from a mammalian cDNA library or produced from the 
corresponding mRNA. The cDNA library may be derived from cells from different 
tissues, e.g. placenta cells or liver cells. The preparation of cDNA via the mRNA route is 
achieved using conventional methods such as the polymerase chain reaction (PGR). 

A DNA encoding a soluble glycosyliransferase is obtainable from a naturally occurring 
genomic DNA or a cDNA according to methods known in die art For example, the partial 
DNA coding for a soluble form of a glycosyltransferase may be excised from the 
full-length DNA coding for die corresponding membrane-bound glycosyltransferase by 
using restriction enzymes. The availability of an appropriate restriction site is 
advantageous therefor. 

Furthermore, DNA encoding a glycosyltransferase can be enzymatically or chemically 
synthesized. 

A variant of a glycosyltransferase having enzymatic activity and an amino acid sequence 
in which one or more amino acids are deleted (DNA fragments) and/or exchanged with 
one or more odier amino acids, is encoded by a mutant DNA. Furthermore, a mutant DNA 
is intended to include a silent mutant wherein one or more nucleotides are replaced with 
other nucleotides, the new codons coding for the same amino acid(s). Such a mutant 
sequence is also a degenerated DNA sequence. Degenerated DNA sequences are 
degenerated within the meaning of die genetic code in Uiat an unlimited number of 
nucleotides are replaced by other nucleotides without resulting in a change of the amino 
acid sequence originally encoded. Such degenerated DNA sequences may be useful due to 
their different restriction sites and/or frequency of particular codons which are prefeired 
by die specific host to obtain optimal expression of a glycosyltransferase. Preferably, such 
DNA sequences have the yeast preferred codon usage. 

A muunt DNA is obtainable by in vitro mutation of a cDNA or of a naturally occurring 
genomic DNA according to methods known in die art 

The invention also concerns hybrid vectors comprising a DNA sequence encoding a 
hybrid protein of die invention. The hybrid vectors of die invention provide for replication 
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and, optionally, expression of the DNA encoding a hybrid protein of the invention. A 
hybrid vector of the invention comprises a DNA sequence encoding a hybrid protein of the 
invention linked with an origin of replication allowing the replication of the vector in the 
host cell, or a functionally equivalent sequence. A vector suitable for the expression of the 
hybrid protein of the invention (an expression vector) comprises a DNA sequence 
encoding said hybrid protein operably linked with expression control sequences, e.g. 
promoters, which ensure the effective expression of the hybrid proteins in yeast, and an 
origin of replication allowing the replication of the vector in the host cell, or a functionally 
equivalent sequence. 

Vectors suitable for replication and expression in yeast contain a yeast replication origin. 
Hybrid vectors that contain a yeast replication origin, for example the chromosomal 
autonomously replicating segment (ars), are retained extrachromosomally within the yeast 
cell after transformation and are replicated autonomously during mitosis. Also, hybrid 
vectors that contain sequences homologous to the yeast 2\i plasmid DNA can be used. 
Such hybrid vectors are integrated by recombination in 2ji plasmids already present within 
the cell, or replicate autonomously. 

Preferably, the hybrid vectors according to the invention include one or more, especially 
one or two, selective genetic markers for yeast and such a marker and an origin of 
replication for a bacterial host, especially Escherichia coll. 

As to the selective gene markers for yeast, any marker gene can be used which facilitates 
the selection for transforraants due to the phenotypic expression of the marker gene. 
Suitable markers for yeast are, for example, those expressing antibiotic resistance or, in 
the case of auxotrophic yeast mutants, genes which complement host lesions. 
Corresponding genes confer, for example, resistance to the antibiotics G418, hygromycin 
or bleomycin or provide for prototrophy in an auxotrophic yeast mutant, for example the 
URA3 , LEU2, LYS2 or TRPl gene. 

As the amplification of the hybrid vectors is conveniendy done in E. coli, an E. coli 
genetic marker and an R coli replication origin are included advantageously. These can be 
obtained from E. coli plasmids, such as pBR322 or a pUC plasmid, for example pUC18 or 
pUC19, which contain both E, coli replication origin and E. coli genetic marker conferring 
resistance to antibiotics, such as ampicillin. 
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An expression vector according to the invention comprises an expression cassette 
comprising a yeast promoter and a DNA sequence coding for hybrid protein of the 
invention, which DNA sequence is controlled by said promoter. 

In a first embodiment, an expression vector according to the invention comprises an 
expression cassette comprising a yeast promoter, a DNA sequence coding for a hybrid 
protein, which DNA sequence is controlled by said promoter, and a DNA sequence 
containing yeast transcription termination signals. 

In a second embodiment, the an expression vector according to the invention comprises an 
expression cassette comprising a yeast promoter operably linked to a first DNA sequence 
encoding a signal peptide linked in the proper reading frame to a second DNA sequence 
encoding a hybrid protein, and a DNA sequence containing yeast transcription termination 
signals. 

The yeast promoter may be a regulated or a constitutive promoter preferably derived from 
a highly expressed yeast gene, especially a Saccharomvces cerevisiae gene. Thus, the 
promoter of the TRPl gene, the ADHI or ADHII gene, the acid phosphatase (PH05) gene, 
a promoter of the yeast mating pheromone genes coding for the a- or a-factor or a 
promoter derived from a gene encoding a glycolytic enzyme such as the promoter of the 
enolase, glyceraldehyde-3-phosphate dehydrogenase (GAP) . 3-phosphoglycerate kinase 
fPGK) . hexokinase, pyruvate decarboxylase, phosphofhictokinase, glucose-6-phosphate 
isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase or glucokinase genes can be used. Furthermore, it is possible to 
use hybrid promoters comprising upstream activation sequences (UAS) of one yeast gene 
and downstream promoter elements including a functional TATA box of another yeast 
gene, for example a hybrid promoter including the UAS(s) of the yeast PH05 gene and 
downstream promoter elements including a functional TATA box of the yeast GAP gene 
(PH05 - GAP hybrid promoter). Preferred is the PHOS promoter, e.g. a constitutive PH05 
promoter such as a shortened acid phosphatase PHOS promoter devoid of the upstream 
regulatory elements (UAS). Particularly preferred is the PHOS (-173) promoter element 
starting at nucleotide -173 and ending at nucleotide -9 of the PHOS gene. 

The DNA sequence encoding a signal peptide ("signal sequence") is preferably derived 
from a yeast gene coding for a polypeptide which is ordinarily secreted. Other signal 
sequences of heterologous proteins, which are ordinarily secreted can also be chosen. 
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Yeast signal sequences are, for example^ the signal and prepro sequences of the yeast 
invertase, a*£actor, pheroroone peptidase (KEXl), "killer toxin" and repressible acid 
phosphatase (PH05) genes and the glucoamylase signal sequence from Aspergillus awa- 
mori. Alternatively, fused signal sequences may be constructed by ligating part of the 
signal sequence present) of the gene namrally linked to the promoter used (for example 
PH05), with part of the signal sequence of another heterologous protein. Those combina- 
tions are favoured which allow a precise cleavage between the signal sequence and the 
giycosyltransferase amino add sequence. Additional sequences, such as pro- or spacer- 
sequences which may or may not cany specific processing signals can also be included in 
the constructions to facilitate accurate processing of precursor molecules. Altemativeiy. 
fused proteins can be generated containing internal processing signals which allow proper 
maturation in vivo or in vitro. For example, the processing signals contain Lys-Arg, which 
is recognized by a yeast endopeptidase located in the Golgi membranes. 

A DNA sequence containing yeast transcription termination signals is preferably the 3' 
flanking sequence of a yeast gene which contains proper signals for transcription 
termination and polyadenylation. Suitable 3' flanking sequences are for example those of 
the yeast gene naturally linked to the promoter used. The preferred flanking sequence is 
that of the yeast PH05 gene. 

If a hybrid protein comprising a membrane-bound giycosyltransferase is expressed in 
yeast, the preferred yeast hybrid vector comprises an expression cassette comprising a 
yeast promoter, a DNA sequence encoding said hybrid protein, which DNA sequence is 
controlled by said promoter, and a DNA sequence containing yeast transcription 
termination signals. If the DNA encodes a hybrid protein comprising a membrane-bound 
giycosyltransferase there is no need for an additional signal sequence. 

In case the hybrid protein to be expressed comprises two soluble glycosyltransferases, the 
preferred yeast hybrid vector comprises an expression cassette comprising a yeast 
promoter operably linked to a first DNA sequence encoding a signal peptide linked in the 
proper reading frame to a second DNA sequence encoding hybrid protein and a DNA 
sequence containing yeast transcription termination signals. 

The hybrid vectors according to the invention are prepared by methods known in the art, 
for example by linking the expression cassette comprising a yeast promoter and a DNA 
sequence coding for a giycosyltransferase, or a variant thereof, which DNA sequence is 



wo 94/12646 



PCT/EP93/03194 



-10- 

controlled by said promoter, or the several constituents of the expression cassette, and the 
DNA fragments containing selective genetic markers for yeast and for a bacterial host and 
origins of replication for yeast and for a bacterial host in the predetermined order, Le. in a 
functional array. 

The hybrid vectors of the invention are used for the transformation of the yeast strains 
described below. 

The invention concerns furthermore a yeast strain which has been transformed with a 
hybrid vector of the invention. 

Suitable yeast host organisms are strains of the genus Saccharomvces. especially strains of 
Saccharomvces cerevisiae. Said yeast strains include strains which, optionally, have been 
cured of endogenous two-micron plasmids and/or which optionally lack yeast peptidase 
activity(ies), e,g. peptidase ysca, yscA, yscB, yscY.and/or yscS activity. 

The yeast strains of the invention are used for the preparation of a hybrid protein of the 
invention. 

The transformation of yeast with the hybrid vectors according to the invention is 
accomplished by methods known in die art, for example according to the methods 
described by Hinnen et al, (Proc. NatL Acad. Sci. USA (1978) 75, 1929) and Ito et al. 
(J. BacL (1983) 153, 163-168). 

The transformed yeast strains are cultured using methods known in the an. 

Thus, the transformed yeast strains according to the invention are cultured in a liquid 
mediimi containing assimilable sources of carbon, nitrogen and inorganic salts. 

Various carbon sources are usable. Examples of preferred carbon sources are assimilable 
carbohydrates, such as glucose, maltose, mannitol, fructose or lactose, or an acetate such 
as sodium acetate, which can be used either alone or in suitable mixtures. Suitable nitro- 
gen sources include, for example, amino acids, such as casamino acids, peptides and pro- 
teins and their degradation products, such as tryptone, peptone or meat extracts, further- 
more yeast extract, malt extract, com steep liquor, as well as ammonium salts, such as 
ammonium chloride, sulphate or nitrate which can be used either alone or in suitable 
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mixtures. Inorganic salts which may be used include, for example, sulphates, chlorides, 
phosphates and carbonates of sodium, potassium, magnesium and calcium. Additionally, 
the nutrient medium may also contain growth promoting substances. Substances which 
promote growth include, for example, trace elements, such as iron, zinc, manganese and 
the like, or individual amino acids. 

Due to the incompatibility between the endogenous two-micron DNA and hybrid vectors 
carrying its replicon, yeast cells transformed with such hybrid vectors tend to lose the 
latter. Such yeast cells have to be grown under selective conditions, i.e. conditions which 
require the expression of a plasmid-encoded gene for growth. Most selective markers cur- 
rently in use and present in the hybrid vectors according to the invention (infra) are genes 
coding for enzymes of amino add or purine biosynthesis. This makes it necessary to use 
synthetic minimal media deficient in the coiresponding amino acid or purine base. How- 
ever, genes conferring resistance to an appropriate biocide may be used as well [e.g. a 
gene conferring resistance to the amino-glycoside G418]., Yeast cells transformed with 
vectors containing antibiotic resistance genes are grown in complex media containing the 
corresponding antibiotic whereby faster growth rales and higher cell densities are reached. 

Hybrid vectors comprising the complete two-micron DNA (including a functional origin 
of replication) are stably maintained within strains of Saccharomvces cerevisiae which are 
devoid of endogenous two-micron plasmids (so-called cir^ strains) so that the cultivation 
can be carried out under non-selective growth conditions, i.e. in a complex medium. 

Yeast cells containing hybrid plasmids with a constitutive promoter express the DNA 
encoding a glycosyltransferase, or a variant thereof, controlled by said promoter without 
induction. However, if said DNA is under the control of a regulated promoter the 
composition of the growth medium has to be adapted in order to obtain maximum levels 
of mRNA transcripts, e.g. when using the PH05 promoter the growth medium must 
contain a low concentration of inorganic phosphate for derepression of this promoter. 

The cultivation is carried out by employing conventional techniques. The culturing 
conditions, such as temperature, pH of the medium and fermentation time are selected in 
such a way that maximal levels of the heterologous protein are produced. A chosen yeast 
strain is e.g. grown under aerobic condidons in submerged culture with shaking or stirring 
at a temperature of about 25*" to 35**C, preferably at about 28°C, at a pH value of firom 4 to 
7, for example at approximately pH 5, and for at least 1 to 3 days, preferably as long as 
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satisfaciory yields of protein arc obtained. 

After expression in yeast the hybrid protein of the invention is either accumulated inside 
the cells or secreted by the cells. In the latter case the hybrid protein is found within the 
periplasmic space and/or in die culture medium. The enzymatic activity may te recovered 
e.g. by obtaining the protein from the cell or the culture supernatant by conventional 
means. 

For example, the first step usually consists in separating the cells from the culture fluid by 
centrifugation. In case the hybrid protein has accumulated within the cells, the enzymatic 
activity is recovered by cell disruption. Yeast cells can be disrupted in various ways 
well-known in the art: e.g. by exerting mechanical forces such as shaking with glass 
beads, by ultrasonic vibration, osmotic shock and/or by enzymatic digestion of the cell 
wall. If desired, the crude extracts thus obtainable can be directly used for glycosylation. 
Further enrichment may be achieved for example by differential centrifugation of the cell 
extracts and/or treatment with a detergent, such as Triton. 

In case the hybrid protein is secreted by the yeast cell into the periplasmic space, a 
simplified isolation protocol can be used: the protein is isolated without cell lysis by 
enzymatic removal of the cell wall or by chemical agents, e.g. thiol reagents or EDTA, 
which gives rise to cell waU damages permitting the produced hybrid protein to be 
released. In case the hybrid protein of the invention is secreted into the culture broth, the 
enzymatic activity can be isolated directiy therefrom. 

Methods suitable for the purification of the cmde hybrid protein include standard 
chromatographic procedures such as affinity chromatography, for example with a suitable 
substtate, antibodies or Concanavalin A, ion exchange chromatography, gel filtration, 
partition chromatography, HPLC, electrophoresis, precipitation steps such as ammonium 
sulfate precipitation and other processes, especially those known from the literature. 

In order to detect glycosyltransferase activity assays known from the literature can be 
used. For example, galactosyltransferase activity can be measurcd by determing the 
amount of radioactively labelled galactose incorporated into a suitable acceptor molecule 
such as a glycoprotein or a free sugar residue. Analogously, sialyliransferase activity may 
be assayed e.g. by the incorporation of sialic acid into a suitable substrate. For a hybrid 
protein exhibiting two different glycosyltransferase activities the activities may be 
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assessed indiviclually or together in a 'single pot assay'. 

A hybrid protein of the invention is useful e.g. for the synthesis or modification of 
glycoproteins, oligosaccharides and glycolipids. If the hybrid molecule comprises two 
different glycosyltransferase activities glycosylation in a one pot reaction is preferred. 

The invention especially concerns the hybrid proteins, the lecominant DNA molecules 
coding therefor, the hybrid vectors and the transformed yeast strains, and the processes for 
the preparation thereot as described in the Examples. 

In the Examples, the following abbreviations are use± GT = galactosyltransferase 
(EC 2.4.1.22), PGR = polymerase chain reaction; ST =sialyltransferase (EC 2.4.99.1). 



Example 1: aoning of the galactosyltransferase (GT) cDNA firom HeLa cells 
GT cDNA is isolated from HeLa cells (Watzele, G. and Berger, E.a (1990) 
Nucleic Adds Res. 18, 7174) by the polymerase chain reaction (PGR) method: 

1.1 Preparation of poly(A)'*"RNA from HeLa cells 

For RNA preparation HeLa cells are grown in monolayer culture on 5 plates (23x23 cm). 
The rapid and efficient isolation of RNA from cultured cells is performed by extraction 
with guanidine-HCl as described by Mac Donald, R.J. et al (Meth. En^miol. (1987) 152, 
226-227). Generally, yields are about 0.6 - 1 mg total RNA per plate of confluent cells. 
Enrichment of poly(A)*RNA is achieved by affinity chromatography on oligo(dT)-cellu- 
lose according to the method described in the Maniatis manual (Sambrook, J., Fritsch, E.F. 
and Maniatis, T. (1989) Molecular Qoning: A Laboratory Manual (2nd edition). Cold 
Spring Harbor Laboratory Press, Cold Spring Habor, USA), applying 4 mg of total RNA 
on a 400 jjJ column. 3 % of the loaded RNA are recovered as enriched poly(A)"*'RNA 
which is stored in aliquots precipitated with 3 volumes of ethanol at -70°C until it is used. 

L2 First strand cDNA synthesis for PGR 

Poly(A)^RNA (mRNA) is reverse-transcribed into DNA by Moloney Murine Leukemia 
Virus RNase H" Reverse Transcriptase (M-MLV RT) (BRL). In setting up the 20 jil 
reaction mix, the protocol provided by BRL is followed with minor variations: 1 \ig of 
HeLa cell poIy(A)*RNA and 500 ng OUgo(dT)i2.i8 (Pharmacia) in 11.5 |xl sterile HjO are 
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heated to 70°C for 10 min and then quickly chilled on ice. Then 4 iii reaction buffer 
provided by BRL (250 mM Tris-HCI pH 8.3, 375 mM KCl, 15 mM MgCy. 2 fil 0.1 M 
dithiothrcitol, 1 jil mixed dNTP (10 mM each dATP, dCIP, dGTP, dlTP, Pharmacia), 
0.5 jxl (17.5 U) RNAguard (RNase Inhibitor of Pharmacia) and 1 jil (200 U)M-MLVH- RT 
are added. The reaction is carried out at 42^C and stopped after 1 h by heating the tube to 
95*'C for 10 min. 

In order to check the efficiency of the reaction an aliquot of the mixture (5 fil) is incubated 
in the presence of 2 ^iCi a-^^P dCTP. By measuring the incorporated dCTP, the amount of 
cDNA synthesized is calculated. The yield of first strand synthesis is routinely between 5 
and 15 %. 

1 .3 Polymerase chain reaction 

The oligodeoxynucleotide primers used for PGR are syntiiesized in vitro by the phosphor- 
amidite method (MH. Caruthers, in Chemical and Enzymatic Synthesis of Gene Frag- 
ments, RG, Gassen and A. Lang, eds., Verlag Chemie, Weinheim, FRG) on an Applied 
Biosystems Model 380B synthesizer. They are listed in Table 1. 



Table 1: PCR-primers 



conesponding to 

pnmer sequence (5' to r)^> bpinGTcDNA^) 

Plup (Kpnl) cgcggtACCCrrOTAAAGCGGCGGCGGG^^ ( -26) - 3 

PI (EcoRI) gccgaa^ATGA(3GCTrCGGGAGCCGCTCCTGAGCG i - 28 

P3 (SacD CTGGAGCTCGTGGCAAAGCAGAACCC 448 - 473 

P2d (EcoRI) gccsaaTTCAGTCTrrACCTGTACCAAAAGTCCTA 1222-1192 

P4 (Hindm) cccaagctTGGAATGATGATGGCCACCTTGTGAGG 546- 520 

C^tal lettets represent sequences fnm GT. small Ictten are additional sequences* sites for restriction enzymes 
are underlined. Codons for *stait' and 'stop' of RNA translation are bigblighted in boldface. 
GT cDNA sequence from human placenta published in GcnBank (Accession No. M22921 ).. 



1) 
2) 



Standard PCR-conditions for a 30 jd incubation mixture are: 1 fil of the Reverse Trans- 
criptase reaction (see Example 1.2), containing about 5 ng first strand cDNA, 15 pmol 
each of the relevant primers, 200 jimol each of the four deoxynucleoside triphosphates 
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(dATP, dCTP, dGTP and dTTP) in PCR-buffer (10 mM Tris-HCI pH 8.3 (at 23°C), 
50 mM KCl, 1.5 naM MgClj, 0.001 % gelatine) and 0.5 U AmpliTaq Polymerase (Peridn 
Elmer). The amplification is performed in die Thermocycler 60 (Biomed) using the 
following conditions: 0 J min denaturing at 95^*0, 1 min annealing at 56°C, and 1 min 
15 sec extension at 72°C. for a total of 20 - 25 cycles. In the last cycle, primer extension at 
72°C is carried out for 5 min. 

For sequencing and subcloning, the HeLa GT cDNA is amplified in two overlapping 
pieces, using different primer combinations: 

(1) Fragment PI -P4: Primers PI and P4 are used to amplify a DNA fragment covering 
nucleotide positions 7-555 in the nucleotide sequence depicted in SEQ ID NO. 1. 

(2) Fragment P3 - P2d: Primers P3 and P2d are used to amplify a DNA ftagment 
covering nucleotide positions 457 - 1229 in the nucleotide sequence depicted in SEQ 
ID NO. 1. 

In order to avoid errors during amplification four independent PCRs are carried out for 
each fragment Also primer Plup (Kpnl) in combination with primer P4 is used to 
determine the DNA sequence followed by the *start' codon. 

After PGR amplification, fragment PI - P4 is digested with the restriction enzymes EcoRI 
and Hindin, analysed on a L2 % agarose gel. eluted from die gel by GENECLEAN 
(BIO 101) and subcloned into the vector pUC18 (Pharmacia), digested with the same 
enzymes. Fragment P3 - P2d is digested with SacI and EcoRI, analysed on a 1.2 % gel, 
eluted and subcloned into pUC18, digested with SacI and EcoRI. The resulting subclones 
are pUC18/Pl - P4 and pUC18/P3 - P2d, respectively. For subcloning, ligation and 
transformation of E. coU strain DH5ot, standard protocols are followed as described in 
Example 2. Minipreparations of Plasmids pUC18/Pl - P4 and pUC18/P3 - P2d are used 
for dideoxy-sequencing of denatured double-stranded DNA widi the T7 polymerase 
Sequencing kit (Pharmacia). M13/pUC sequencing primers and reverse sequencing 
primers (Pharmacia) are applied to sequence overlapping fragments produced from both 
DNA strands by digestion with various restriction enzymes. Further subcloning of 
restriction fragments of the GT gene is necessary for extensive sequencing of overlapping 
fragments of both strands. The sequence of fragments amplified by independent PCRs 
shows that the error of amplification is less than 1 in 3000 nucleotides. The complete 
nucleotide sequence of the HeLa cell GT cDNA which is presented in SEQ ID NO. 1 is 
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99,2 % homologous to that of human placenta (Genbank Accession No. M22921). Three 
differences are found: 

(a) Three extra base pairs at nucleotide positions 37-39 (SEQ ID NO. 1) resulting in one 
extra amino acid (Ser) in the N-terminal region of the protein; (b) bp 98 to 101 are 
'CTCT' instead of 'TCTG' in the sequence of human placenta, leading to two 
conservatiye amino acid substitutions (Ala Leu instead of ValTyr) at amino acid 
positions 31 and 32 in the membrane spanning domain of GT; (c) the nucleotide at 
position 1047 is changed from 'A' to 'G' without ensuing a change in amino acid 
sequence. 

The two overlapping DNA-fragments PI - P4 and P3 - P2d encoding the HeLa GT cDNA 
are joined via the NotI restriction site at nucleotide position 498 which is present in botii 
fragments. 

The complete HeLa ceU GT cDNA is cloned as a 1.2 kb EcoRI-EcoRI restriction fiagment 
in plasmid pIC-7, a derivative of pUC8 with additional restriction sites in the multicloning 
site (Marsh, JX., Erfle. M. and Wykes, EJ. (1984) Gene 32, 481-485), resulting in vector 
p4ADl 13. SEQ ID NO. 1 shows the DNA sequence of the EcoRI-HindlH fragment from 
plasmid p4ADl 13 comprising HeLa cell cDNA coding for fuD-length GT (EC 2.4.1.22), 
said fragment having the following features: 



For die purpose of creating the GT expression cassette the EcoRI restriction site 
(bp 1227) at the 3' end of the cDNA sequence is deleted as follows: vector p4ADl 13 is 
first linearized by digestion with EcoRV and then treated with alkaline phosphatase. 
Furthermore, 1 jig of the linearised plasmid DNA is partially digested with 0.25 U EcoRI 
for 1 h at 37''C. After agarose gel electrophoresis a fragment coiresponding to the size of 
the linearized plasmid (3.95 kb) is isolated from the gel by GENEGLEAN (Bio 101). The 
protruding EcoRI end is filled in witii Klenow polymerase as described in die Maniatis 
manual (supra). After phenolisadon and edianol precipitation die plasmid is religated and 
used to transform E.coli DH5a (Gibco/BRL). Minipreparation of plasraids are prepared 



from 6 to 1200 bp 



cDNA sequence coding for HeLa cell 
galactosyltransferase 



from 1 to 6 bp 



EcoRI site 
NotI site 
EcoRI site 
EcoRV site 
Bginsite 



from 497 to 504 bp 
from 1227 to 1232 bp 
from 1236 to 1241 bp 
from 1243 to 1248 bp 
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fix>m six transformants. The plasmids obtained are checked by restriction analysis for the 
absence of the EcoRI and EcoRV restriction sites at the 3' end of HeLa GT cDNA. The 
piasmid designated p4AEl 13 is chosen for the following experiments, its DNA sequence 
being identical to that of piasmid p4AD113. with the exception that bp 1232-1238 with the 
EcoRI-EcoRV restriction sites are delete! 

Example 2: Construction of expression cassettes for full length GT 
For heterologous expression in Saccharomvces cerevisiae the full length HeLa GT cDNA 
sequence (SEQ ID NO- 1) is fused to transcriptional control signals of yeast for efficient 
initiation and termination of transcriptioa The promoter and terminator sequences 
originate from the yeast acid phosphatase gene (PHQ5) (EP 10056 !)• A short, 173 bp 
PH05 promoter fragment is used, which is devoid of all regulatory elements and therefore 
behaves as a constitutive promoter. 

The GT cDNA sequence is combined with a yeast 5' truncated PH05 promoter fragment 
and transcription terminator sequences as follows: 

(a) Full length HeLa GT cDNA sequence: 

Vector p4AEl 13 with the full length GT cDNA sequence is digested with the restriction 
enzymes EcoRI and BgUI, The DNA fragments are electrophoretically separated on a 1 % 
agarose gel. A 1.2 kb DNA fragment containing the complete cDNA sequence for HeLa 
GT is isolated from the gel by adsorption to gla5milV, using the GENECLEAN kit 
(BIO 101). On this fragment the ' ATG' start codon for protein syntiiesis of GT is located 
directiy behind the restriction site for EcoRI, whereas the stop codon 'TAG' is followed 
by 32 bp contributed by tiie 3'untranslated region of HeLa GT and the multiple cloning 
site of the vector with the BgUI restriction site. 

(b) Vector for amplification in E coli : 

The vector for amplification, piasmid p3lR (cf. EP 100561), a derivative of pBR322, is 
digested with the restriction enzymes BamHI and Sail. The restriction fragments are 
separated on a 1 % agarose gel and a 3.5 kb vector fragment is isolated from the gel as 
described before. This DNA fragment contains the large Sail - Hindin vector fragment of 
tiie pBR322 derivative as well as a 337 bp PH05 transcription terminator sequence in 
place of the Hindin - BamHI sequence of pBR322. 



(c) Construction of piasmid p31/PH05(-173)RIT 
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The 5' truncated PH05 promoter fragment without phosphate regulatory elements is 
isolated from plasraid p31/PH05(-173)Rrr. 

Piasmid p31 RTTll (EP 288435) comprises the full length, regulated PH05 promoter (with 
an EcoRI site introduced at nucleotide position -8 on a S34bp BamHI - EcoRI fragment, 
followed by the coding sequence for the yeast invertase signal sequence (72bp EcoRI - 
Xhol) and the PH05 transcription termination signal (135bp Xhol - HindUO cloned in a 
tandem array between BamHI and HindlH of the pBR322 derived vector. 

The constitutive PH05{-I73) promoter element from piasmid pJDB207^05(-173)-YHIR 
(EP 340170) comprises the nucleotide sequence of the yeast PH05 promoter from 
nucleotide position -9 to -173 (BstEIl restriction site), but has no upstream regulatory 
sequences (UASp). The PH05(-173) promoter, therefore, behaves like a constitutive 
promoter. The regulated PH05 promoter in piasmid p31Rrri2 is replaced by the short, 
constitutive PHG5 (-173) promoter element in order to obtain piasmid p31 /PH05 (-173) 
RTT, 

Plasmids p31Rm2 (EP 288435) and pJDB207Mi05(.173)-YHIR (EP 340170) are 
digested with restriction endonucleases Sail and EcoRI, The respective 3.6 kb and 0.4 kb 
Sail - EcoRI fragments are isolated on a 0.8 % agarose gel, eluted from the gel, ethanol 
precipitated and resuspended in H2O at a concentration of 0.1 pmoles/^tl. Both DNA 
fragments are ligated and 1 ^1 aliquots of the ligation mix are used to transform £. coli 
HBlOl (ATCO competent cells. Ampicillin resistant colonies are grown individually in 
LB medium supplemented with ampicillin (100 fig/ml). Piasmid DNA is isolated accord- 
ing to the method of Holmes, D.S. et al. (AnaL Biochem. (1981) 144, 193) and analysed 
by restriction digests with Sail and EcoRL The piasmid of one clone with the correct 
restriction fragments is referred to as p31/PH05(-173)RIT. 

(d) Construction of piasmid pGTB 1 135 

Piasmid p31 /PH05 (-173)RIT is digested with the restriction enzymes EcoRI and Sail. 
After separation on a 1 % agarose gel, a 0.45 kb Sail - EcoRI fragment (fragment (c)) is 
isolated from the gel by GENEC1£AN (BIO 101). This fragment contains the 276 bp 
Sall-BaraHI sequence of pBR322 and the 173bp BamHI(BstEn)-EcoRI constitutive PH05 
promoter fragment The 0.45 kb Sall-EcoRI fragment is ligated to the 1.2 kb EcoRI - Bglll 
GT cDNA (fragment (a)) and the 3.5 kb BamHI-Sall vector part for amplification in E. 
coli with the PH05 terminator (fragment (b)) described above. 
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The three DNA fragments (a) to (c) are ligated in a 12 nl ligation mixture: 100 ng of DNA 
fragment (a) and 30 ng each of fragments (b) and (c) are ligated using 0.3 U T4 
DNA ligase (Boehringer) in the supplied ligase buffer (66 mM Tris-HCl pH 7.5, 1 mM 
dithicerythritol, 5 mM MgQj, 1 mM ATP) at 15°C for 18 hours. Half of the Ugation mix 
is used xo transform competent cells of E coU strain DH5a (Gibco/BRL). For preparing 
competent cells and for transformation, the standard protocol as given in the Maniatis 
manual (supra) is followed. The cells are plated on selective LB-medium, supplemented 
with 75 ^g/ml ampicillin and incubated at 37°C 58 transformants are obtained. 
Minipreparations of plasmid are performed from six independent transformants by using 
the modified alkaline lysis protocol of Bimboim, KC. and Doly, I as described in the 
Maniatis manual (supra). The isolated plasmids are characterized by restriction analysis 
with four different enzymes (EcoRI, PstI, Hindlll, Sail, also in combination). All six 
plasmids show the expected fragments. One correct clone is referred to as pGTB 1 135. 
Plasmid pGTB 1135 contains the expression cassette with the fiill-lenghtHeLaGT cDNA 
under the control of the constitutive PH05 (-173) promoter fragment, and the PH05 
transcriptional terminator sequence. This expression cassette can be excised from vector 
pGTB 1135 as a 2 kb San - Hindm fragment 

Example 3: Construction of plasmids pAl and dA2 
3,1 PGR for site-directed mutagenesis 

In order to knock out the stop codon of the GT coding sequence and to allow for an in 
frame fusion with ST a frame shift mutation and a point mutation are introduced into the 
cDNA coding for HeLa GT. The oligonucleotide primers used for PGR are synthesized in 
vitro according to the phosphoramidite method (supra) and listed in Table 2. 

Table 2: PCR-priitiers 

primer sequence (5' to 3')^^ corresponding to bp 

in SEQ ID NO. 3 

P3 (Sad) CTGGAGCTCGTGGCAAAGCAGAACCC 457 - 482 

P2A1 (BamHI) g ggcraTCC TAGCTCG-TGTCCC 1205 - 1189 

P2B1 (BamHI) ggssaTCCCAGCTCG-TGTCCC 1205 - 1189 

Capital letters represent sequences from GT. smaU letters are additional sequences, sites for restriction enzymes are 
underlined. Codons for 'start' and 'stop' of RNA translation arc highlighted in boldface. 
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Standard PCR-condidons for a 30 incubation mixture are: i \xl of the Reverse Txans- 
criptase reaction mix containing about 5 ng first strand cDNA (see Example 1.2), 15 pmol 
each of the relevant primers, 200 fimol each of the four deoxynucleoside triphosphates 
(dATP, dCTP, dGTP and dlTP) in PCR-buffer (10 mM Tris-HQ pH 8.3 (at 23'*C), 
50 mM KCl, IJ mM MgClj, 0.001 % gelatine) and 0.5 U AmpliTaq Polymerase (Peddn 
Elmer). The amplification is performed in the Thermocycler 60 (Biomed) using the 
following conditions: 0 J min denaturing at 95**C, 1 min annealing at 56°C, and 1 min 
15 sec extension at 72*^0, for a total of 20 - 25 cycles. Tn the last cycle, primer extension at 
72**C is carried out for 5 nun. 

For sequencing and subcloning, the HeLa GT cDNA is amplified as described above, 
yielding "mutated" fragments: 

(3) Fragment P3-P2A1: primers P3 and P2A1 are used to amplify a 0.77 kb fiagment 
covering nucleotides 457-1205 in tiie sequence depicted in SEQ ID NO. 3 

(4) Fragment P3-P2B1: primers P3 and P2B1 axe used to amplify a 0.77 kb fi^gment 
covering nucleotide positions 457-1205 in die sequence depicted in SEQ ID NO. 3. 

3.2 Construction of plasmids pAl and pA2 

Fragments P3-P2A1 and P3-P2B1, respectively, are amplified by PGR, digested witii 
BaraHl and Sad and subcloned into vector pUC18 (Pharmacia), digested with die same 
enzyme to produce^lasmids pAl and pA2, 

Example 4: Cloning of die sialvltransferase (ST) cDNA from human HepG2 cells 
ST cDNA is isolated from HepG2 cells by PGR in analogy to GT cDNA. Preparation of 
poly (A)+RNA and first strand cDNA syndiesis are performed as described in Example 1. 
The primers (Microsyndi) listed in Table 3 are used for PCR. 
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Table 3: PCR-primers 

cone^onding to bp 

primer sequence (5'to3')^) inSTcDNA^) 

Pstl/EcoRI 

SIAl cgctgcagaattca aaATGATTCACACCAACCTGAAGAAAAAGT 1 - 28 
Bamm 

SIA3 cficggatCCT GTGCTTAGCAGTGAATGGTCCGGAAGCC 1218 -1198 



^ Capital letters reprcsqit sequences from ST, small lettexs are additiooal serpicncrs with sites for restrictioa enzymes 
(underlined). Codons for 'start* and 'stop' for protein syndiesis are indicated in boldface. 
ST cDN A sequence from human placenta as published in EMBL Data Bank (Accession No. X17247) 

HepG2 ST cDNA can be amplified as one DNA fragment of 1.2 kb using the primers 
SIAl and SIA3. PGR is performed as described for GT cDNA under slightly modified 
cycling conditions: 0.5 min denaturing at 95°C, 1 min. 15 sec annealing at 56°C, and 
1 min 30 sec extension at 72°C, for a total of 25-35 cycles. In the last cycle, primer 
extension at 72*'C is carried out for 5 min. 

After PGR amplification, the 1.2 kb firagment is digested with the restriction enzymes 
BamHI and PstI, analysed on a 1,2 % agarose gel, eluied from the gel and subcloned into 
the vector pUC18- The resulting subclone is designated pSIA2. The nucleotide sequence 
of the PsiI-BamKI fragment from plasmid pSIA2 comprising HepG2 cDNA coding for 
full-length sialyltransferase is presented in SEQ ID NO. 3, said fragment having the 
following features: 

from 15 to 1232 bp cDNA sequence coding for HepG2 cell 

sialyltransferase 

from 1 to 6 bp PstI site 

from 6 to 1 1 bp EcoRI site 

from 144 to 149 bp EcoRI site 

from 1241 to 1246 bp BamHI site. 

Example 5: Construction of plasmids pAlST and oBlST 

a) Plasmid pSIA2 is double digested using EcoRl/BamHl and the ensuing 1098 bp 
fragment (fragment (a)) is isolated The fragment codes for a soluble ST designated 
ST(44^) starting at amino acid position 44 (Glu) and extending to amino acid position 
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406 (Cys) (SEQIDN0. 4). 

b) Plasmids Al and Bl are linearized by BamHl digestion, treated with alkaline 
phosphatase and separated from contaminating enzymes by gel electrophoresis using 
GENECLEAN (Bio 101). 

c) Fragment (a) is linlced to fragment (b) by means of an adaptor sequence from equimolar 
amounts of the synthesized oligonucleotides (Microsynth): 

5* GATCCGTCGACCTGCAG 3' and 5' AATTCAGCAGGTCGACG 3' for the 
complementary strand. The oligonucleotides are annealed to each other by first heating to 
9S^C and then slowly cooling to 20^. Ligation is carried out in 12 (il of ligase buffer 
(66 mM Tris-HCl pH 7.5, 1 mM dithioerythritol 5 mM MgCl^, 1 mM ATP) at 16°C for 
18 hours. The sequences at the junction of GT and ST are as follows: 

pAlST: BamHl Adaptor (bold) EcoRl 

GGG ACA CGA GCT AGG ATC C CT CGA CCT GCA GAA TTC CAG GTG 

Gly Thr Arg Ala Arg lie Arg Arg Pro Ala Glu Phe Gin Val 

PBIST: 

GGG ACA CGA GCT GGG ATC C GT CGA CCT GCA GAA TTC CAG GTG 
Gly Thr Arg Ala Gly lie Arg Arg Pro Ala Glu Phe Gin Val 

The iigated plasmids pAlST and pBlST are transformed into E. coii strain DHSa. 
Plasmid DNA of 6 transformants from each transformation is isolated and digested with 
EcoRI to test the orientation of the BamHl insert Plasmfids with a 3900 bp together with 
a 700 bp EcoRI fragment are used for the next step. 

Example 6: Construction of the GT-ST expression vectors YEPGSTa and YEPGSTb 

6.1 Isolation of a Notl-BamHI firagment coding for the GT C-terminus fused to ST 
Plasmids pAlST and pBlST are linearised by cutting with NotI and then partially digested 
with BamHl. After gel electrophoresis a 1900 bp Notl-BamHI fragment coding for the GT 
C-terminus fused to ST is isolated. 

6.2 Constraction of the YEPGTB vector 

The episomal yeast vector YEP352 (S.E. Hill et aL, Yeast 2, 163-167, 1986) is used to 
construct the YEPGTB vector which contains the constimtive PH05 promoter, the cDNA 
coding for fuU length GT and the PHOS transcriptional terminator sequence. 
YEP352 is digested with the restriction enzymes Sail and HindHI at the multiple cloning 
site. After separation over an 0.8% agarose gel the linearized vector is isolated as a 5^ kb 
DNA fragment (vector part) from the gel with the GENECLEAN kit (Bio 101). Vector 
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pGTBllSS (Example 2) is also digested with the restriction enzymes Sail and HindllL A 
2.0 kb fragment containing the expression cassette with the constitotive promoter is 
isolated. Ligation of the yeast vector and the exprssion cassette is carried out as foUows: in 
a 12 jil ligation mix, 80 ng of the vector part (5.2 kb fragment) is combined with 40 ng of 
the 2.0 kb Sall-HindHI fragment using 03U ligase (Boehringer) in the supplied buffer (66 
mM Tris-HCl pH 7.5, 1 mM dithioerythritol, 5 mM MgO^, 1 mM ATP) for 18 hours at 
15^C. The ligation mix is used to transform E.coli DH5a as described above. 24 
transf ormants are obtained. Four independent colonies are chosen for minipreparation of 
plasmids. The isolated plasmids are characterized by restriction analysis: all four analyzed 
piasmids (YEPGTB 21-24) show the expected restriction patterns. YEPGTB24 is used for 
further experiments. 

6.3 Isolation of the fragment coding for the N-terminal part of GT. 

YEPGTB24 carrying the whole constitutive expression cassette for GT in the yeast-E.coli 
shuttle vector YEP352 is cut with Noil and HindHI and a 63 kb fragment is isolated after 
gel electrophoresis. 

6.4 PHOS-terminator sequence 

Plasmid p3 1 RUn (EP 288435) is cut with BamHI and Hind m and a 400 bp fragment 
carrying the PH05 terminator sequnce is isolated. 

Fragments isolated as described in 6.1 (1.9 kb Notl-BamHI fragment, 6.3 (6.3 kb 
Hindm-Notl fragment) and 6.4 (0.4 kb BamHI-Hindin fragment) are ligaled to yield 
plasmids YEPGSTa and YEPGSTb, respectively, which are transformed in the E.coli 
strain DHScl Plasmid DNA of transformants carrying the predicted pattern of BamHI 
fragments with 5580 bp, 1375 bp, 1150bp and 276 bp are used for yeast transformation. 
The nucleotide sequences of the cDNAs coding for the hybrid glycosyltransferases 
designated GT-STa and GT-ST1> are presented in SEQ ID NOs: 5 and 7, respectively, said 
sequences having the following common features: 



from 1 to 1188 bp 



cDNA sequence coding for HeLa cell 
GT(i.396) (cf. SEQ ID NO.l) 
Adaptor 

cDNA sequence coding for HepG2 cell 



from 1189 to 1212 bp 
from 1213 to 2301 bp 
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Example 7: Transformation of yeast strain BT 150 

CsCl-purified DNA of the expression vectors YEPGSTa and YEPGSTb is prepared 
following the protocol of R. Treisraan in the Maniatis manual (supra). The protease 
deficient S. cerevisiae strain BT 150 (MATo, his4, leu2, ura3,.pral, prbl, prcl, cpsl) is 
transfoimed with about 1 [ig of plasmids YEPGSTa and YEPGSTb, respectively, 
according to the lithium-acetate transformation method (Ito et aL, J. Bact (1983) 153, 
163-168), Approximately 200 transformants are obtained with YEPGSTa and YEPGSTb 
on SD plates (0.67% yeast nitrogen base without amino acids, 2% glucose. 2% agarose 
supplemented with leucine (30 ng/ml) and histidine (20 jig/ml). Single transfonned yeast 
ceils are selected and referred to as Saccharomyces cerevisiae BT 150/YEPGCTa and 
Saccharomyces cerevisiae BT 150/YEPGSTb, respectively. 

Example 8: Enzyme activity of the GT-ST hybrid proteins 

8. 1 Preparation of cell extracts 
Preparation of cell extracts 

Cells of transformed Saccharomyces cerevisiae strains BT 150 are each grown under 
uracil selection in yeast minimal media (Difco) supplemented with histidine and leucine. 
The growth rate of the cells is not affected by the introduction of any of the expression 
vectors. Exponentially growing cells (at OD578 of 2.0) or stationary cells are collected by 
centrifugation, washed once with 50 mM Tris-HQ buffer pH 7.4 (buffer 1) and 
resuspended in buffer 1 at a concentration corresponding to 2 0057^, A 60 ml culture (120 
0^578) of cells is washed, pelleted and subjected to mechanical breakage by 
vigorous shaking on a vortex mixer with glass beads (0.45 - 0.5 mm diameter) for 4 min 
with intermittent cooling. The crude extracts are used directly for determination of 
enzyme activity* 

8.2 Protein assay 

The protein concentration is determined by use of the BCA-Protein Assay Kit (Pierce). 

8.3 Assay for GT activity 

GT activity can be measured with radiochemical methods using either ovalbumin, a 
glycoprotein which solely exposes GlcNAc as acceptor site, or free GlcNAc as acceptor 
subsuutes. Cell extracts (of 1 - 2 ODs 575 of cells) are assayed for 30 min at 37°C in a 
100 jil incubation mixture containing 35 mM Tris-HCl pH 7.4, 25 nCi UDP-^'^C-Gal 
(1.25 mCi/mmol), 1 jimol MnClj* 2 % Triton X-100 and 1 rag ovalbumin or 20 mM 
GlcNAc as acceptor substtates . The reaction is terminated by acid precipition of the 



wo 94/12646 



PCT/EP93/03194 



-25- 

protein and the amount of ^"^C galactose incorporated into ovalbumin is determined by 
liquid scintillation counting (Berger, E.G. et aL (1978) Eur. J. Biochem, 90, 213-222). For 
GlcNAc as acceptor substrate, the reaction is terminated by the addition of 0.4 ml ice cold 
H2O and the unused UDP-^'^C-galactose is separated from products on an anion 
exchange column (AG Xl-8, BioRad) as described (Masibay, A.S- and Qasba, P,K. (1989) 
Proc. Natl. Acad, ScL USA 86, 5733-5737). Assays are performed with and without 
acceptor molecules to assess the extent of hydrolysis of UDP-Gal by nucleotide 
pyrophosphatases. GT activity is determined in the crude extracts prepared from 
Saccharomyces cerevisiae BT 150/YEPGSTa and Saccharomyces cerevisiae 
BT 150/YEPGSTb. 

8.4 Determination of optimum detergent acdvation 

The standard assay of GT activity according to Example 8.3 using 10 mM GIcNac as 
acceptor substrate is carried out in presence of zero, O.L 0.5, 1.0, 2.0, 2 J and 4 % Triton 
X- 100 in the assay. 2 % Triton X-100 induce a two fold stimulation as compared with 
zero % Triton. 

8.5 Assay for lactose synthase activity 

The assay is carried out and terminated as indicated in Example 8.3 for GlcNAc as 
acceptor with the following modifications: instead of GlcNAc, 30 mM glucose is used as 
acceptor. Other ingredients include: 1 mg/ml human a-lactalbumin, 10 mM ATP. 
Optimiun concentration of a-lactalbumin is determined in a range of 0 to 4 mg/ml 
a-lactalbumin. Maximum lactose synthase activity is observed at 1 mg/ml, 

8.6 Assay for ST activity 

ST activity can be determined by measuring the amount of radiolabeled sialic acid which 
is transfened from CMP-sialic acid to a glycoprotein acceptor. In case of the use of a 
glycoprotein as acceptor such as asialofetuin, the xeaction is terminated by acid 
precipitation using 5% (w/v) phosphotungstic acid and 5% (w/v) trichloroacetic acid. The 
precipitate is filtered using glass fiber filters (Whatman GFA). washed extensively with 
ice-cold ethanol and assessed for radioactivity by liquid scintillation counting (Hesford et 
al. (1984), Glycoconjugate J. 1, 141-153). In case of the use of oligosaccharides as 
acceptors such as lactose or LacNAc (N-acetyllactosamin), the reaction is terminated by 
addition of 0.4 ml ice-cold H2O. The unused CMP-*^C-sialic acid is retained on a 
1 ral-coluran of AG1-X8, phosphate form, 100-200 mesh. The column is washed with 
4.5 ml HjO and eluted with 24 ml 5 mM K2HPO4 buffer at pH 6.8. Eluant and wash 
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solution are pooled and assessed for radioactivity by liquid scintillation counting. Standard 
conditions are as follows: 20 fil of yeast extracts (200 to 500 \ig protein) are incubated 
with 300 fig asialofetuin in 2 mM imidazole buffer pH 7.4 and 3 nmoles CMP-^'^sialic acid 
(specific activity: 2.7 mCi/mmoI), Triton X-100 0.5 %. ST-activity is found in the crude 
extracts prepared from Saccharomyces cerevisiae BT ISOA'^PGSTa and Saccharomyces 
cerevisiae BT ISOA^EPGSTb. 

8.7 Combined GT and ST activity 

Yeast extracts prepared iBrom Saccharomyces cerevisiae BT 150/YEPGSTa and 
Saccharomyces cerevisiae BT 150/YEPGSTb are used to transfer Gal from UDPGal and 
sialic acid from CMPNeuAc to asialo-agalacto-aj acid glycoprotein or GlcNAc according 
to the following conditions: 30 fil of extract, 20 fil of asialo-agalacto-aj acid glycoprotein 
(prepared according to Hughes, R.C. and Jeanloz, R.W., (1966), Biochemistry 5. 
253-258), 2 mM of unlabeled UDPGal, 60 fiM of CMP^'^-sialic acid (specific activity: 5.4 
raCi/mmol) in 2 mM imidazole buffer, pH 7.4. ST-activity is shown by incorporation of 
^"^C-sialic acid. Control incubation earned out in the absence of unlabeled UDPGal results 
in a 4 times less incorporation of ^'•C-sialic acid . 

Similar incubations axe carried out using 20 mM GlcNAc or 30 mM glucose (in presence 
of 0.1 mg/ml a-lactalbumin) as acceptor and isolating the product according to 8.6. Linear 
incorporations of ^"^C-sialic acid are observed during 180 min. The assay system contains 
in a final volume of 1 ml: 3 mmol glucose, 1 mg a-lactalburain, 1 mM ATP, 1 mmol 
MnClj, 20 mmol Tris-HQ, pH 7,4 20 nmol UDPGal 12 nmol CMP^^t-siaUc acid (4,4 
mCi/mmol specific activity) and 350 ^lg protein (yeast extract). The reaction is tenninated 
by adding 0.4 ml of ice-cold HjO. The mixture is passed over a 2 cm Bio-Rad Poly-Prep^ 
column containing AG1-X8 A6, 100-200 mesh, phosphate form. The column is washed 
with 4.5 ml HjO and eluted with 24 ml SmM K2HPO4 buffer at pH 6.8. 1 ml of the eluant 
is used for radioactivity measurement by liquid scindllaiion counting in 10 ml Instagel^. 

8.8 Product identification of oligosaccharides synthesized by the GT-ST hybrid proteins 
8.8.1 Synthesis of 2,6 sialyllacNAc 

The incubation mixture contains in a volume of 1.57 ml: 20 mmol GlcNAc, 10 mM ATP, 
1 mMol MnCl2;5 mg Triton X-100, 200 raMol UDPGal, 30 mmol CMP ^^C-sialic add 
(4.4 mCi/ramol specific activity) and 1000 |ig protein (yeast extract prepared from 
Saccharomyces cerevisiae BT 150/YEPGSTa and Saccharomyces cerevisiae BT 
150/YEPGSTb, respectively) . Incubation is carried out for 16 h at 37*'C The reaction is 
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tenninated by adding 0.5 ml of HjO. The incubation mixture is separated on AG1-X8 as 
described in Example 8.7. The total eluant of the anion exchange column is lyophilized. 
Then, the residue is dissolved in 0.6 ml H2O followed by separation on a Biogel P2 
column (200-400 mesh, 2x90 cm). The column is eluted with HjO at a temperature of 
42.5*C at 5 ml/h. 0 J ml fractions are collected and assessed for radioactivity in 100 jU 
aliquots (to which 4 ml Instagel* is added for liquid scintillation counting). The peak 
fractions containing ^'^C are pooled, lyophilized and repurified on AG1-X8 as described in 
Example 8.7. The total eluant of 24 ml is lyophilized, the resulting residue dissolved in 
300 \il H2O. This solution is subjected to preparative thin layer cliromatDgnq}hy (Merck 
Alu plates coated with silicagel 60 F254) in a solvent system containing 
H^O/acetone/n-butanoI 2/U/lJ for S h and run against authentic standards including 50 
mM sialyl 2v6-lactose and 2,6 sialyl LacNAc. After drying the products and standards are 
visualized using a spray containing 0.5 g thymol in 5 ml H2SO4 (96 %) and 95 ml ethanol 
(96 %) followed by heating for 10 min at 130°C. The spots detected are found to be at 
identical positions as the corresponding authentic standards. 
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SEQUENCE LISTING • 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: CIBA-GEIGY AG 

(B) STREET: Klybeckstr, 141 

(C) CITY: Basel 

(E) COUNTRY: SCHWEI2 

(F) POSTAL CODE (ZIP) : 4002 

(G) TELEPHONE: +41 61 69 11 11 

(H) TELEFAX: + 41 61 696 79 76 

(I) TELEX: 962 991 

(ii) TITLE OF INVENTION: Proteins having glycosyltransf erase 
activity 

(iii) NUMBER OF SEQUENCES: 8 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC coii5)atible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

CD) SOFTWARE: PatentIn Release #1,0, Version #1.25 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: E. coli DHSalpha 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: p4AD113 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 7.. 1200 

(D) OTHER INFORMATION: /prociuct= "full-length 
galactosyl transferase" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



GAATTC ATG AGG CTT CGG GAG CCG CTC CTG AGC GGC AGC GCC GCGATG 48 
Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met 
1 5 10 

CCA GGC GCG TCC CTA CAG CGG GCC TGC CGC CTG CTC GTG GCC GTC TGC 96 
Pro Gly Ala Ser Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys 
15 20 25 30 

GCT CTG CAC CTT GGC GTC ACC CTC GTT TAC TAG CTG GCT GGC CGC GAC 144 
Ala Leu His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp 
35 40 45 

CTG AGC CGC CTG CCC CAA CTG GTC GGA GTC TCC ACA CCG CTG CAG GGC 192 
Leu Ser Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly 
50 55 60 



GGC TCG AAC AGT GCC GCC GCC ATC GGG CAG TCC TCC GGG GAG CTC CGG 
Gly Ser Asn Ser Ala Ala Ala He Gly Gin Ser Ser Gly Glu Leu Arg 
65 70 75 



240 
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ACC GGA GGG GCC CGG CCG CCG CCT CCT CTA GGC GCC TCC TCC CAG CCG 
Thr Gly Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro 
80 85 90 



288 



CGC CCG GGT GGC GAC TCC AGC CCA GTC GTG GAT TCT GGC CCT GGC CCC 
Arg Pro Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro 
95 100 105 110 



336 



GCT AGC AAC TTG ACC TCG GTC CCA GTG CCC CAC ACC ACC GCA CTG TCG 
Ala Ser Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser 
115 120 125 



384 



CTG CCC GCC TGC CCT GAG GAG TCC CCG CTG CTT GTG GGC CCC ATG CTG 
Leu Pro Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu 
130 135 140 



432 



ATT GAG TTT AAC ATG CCT GTG GAC CTG GAG CTC GTG GCA AAG CAG AAC 
He Glu Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gin Asn 
145 150 155 



480 



CCA AAT GTG AAG ATG GGC GGC CGC TAT GCC CCC AGG GAC TGC GTC TCT 
Pro Asn Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser 
160 165 170 



528 



CCT CAC AAG GTG GCC ATC ATC ATT CCA TTC CGC AAC CGG CAG GAG CAC 
Pro His Lys Val Ala He He He Pro Phe Arg Asn Arg Gin Glu His 
175 180 185 190 



576 



CTC AAG TAC TGG CTA TAT TAT TTG CAC CCA GTC CTG CAG CGC CAG CAG 
Leu Lys Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin 
195 200 205 



624 



CTG GAC TAT' GGC ATC TAT GTT ATC AAC CAG GCG GGA GAC ACT ATA TTC 
Leu Asp Tyr Gly He Tyr Val He Asn Gin Ala Gly Asp Thr He Phe 
210 215 220 



672 
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AAT CGT GCT AAG CTC CTC AAT GTT GGC TTT CAA GAA GCC TTG AAG GAC 720 
Asn Arg Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp 
225 230 235 

TAT GAC TAC ACC TGC TTT GTG TTT AGT GAC GTG GAC CTC ATT CCAATG 768 
Tyr Asp Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu lie Pro Met 
240 245 250 

AAT GAC CAT AAT GCG TAC AGG TGT TTT TCA CAG CCA CGG CAC ATTTCC 816 
Asn Asp His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His He Ser 
255 260 265 270 

GTT GCA ATG GAT AAG TTT GGA TTC AGC CTA CCT TAT GTT CAG TAT TTT 864 
Val Ala Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr. Val Gin Tyr Phe 
275 280 285 

GGA GGT GTC TCT GCT CTA AGT AAA CAA CAG TTT CTA ACC ATC AAT GGA 912 
Gly Gly Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr He Asn Gly 
290 295 300 

TTT CCT AAT AAT TAT TGG GGC TGG GGA GGA GAA GAT GAT GAC ATTTTT 960 
Phe Pro Asn Asn Tyr Trp. Gly Tip Gly Gly Glu Asp Asp Asp He Phe 
305 310 315 

AAC AGA TTA GTT TTT AGA GGC ATG TCT ATA TCT CGC CCA AAT GCT GTG 1008 
Asn Arg Leu Val Phe Arg Gly Met Ser He Ser Arg Pro Asn Ala Val 
320 325 330 

GTC GGG AGG TGT CGC ATG ATC CGC CAC TCA AGA GAC AAG AAA AAT GAA 1056 
Val Gly Arg Cys Arg Met He Arg His Ser Arg Asp Lys Lys Asn Glu 
335 340 345 350 

CCC AAT CCT CAG AGG TTT GAC CGA ATT GCA CAC ACA AAG GAG ACAATG 1104 
Pro Asn Pro Gin Arg Phe Asp Arg He Ala His Thr Lys Glu Thr Met 
^55 360 365 
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CTC TCT GAT GGT TTG AAC TCA CTC ACC TAC CAG GTG CTG GAT GTACAG 1152 
Leu Ser Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin 
370 375 380 



AGA TAC CCA TTG TAT ACC CAA ATC ACA GTG GAC ATC GGG. ACA CCGAGC 1200 
Arg Tyr Pro Leu Tyr Thr Gin He Thr Val Asp He Gly Thr Pro Ser 
385 390 395 



TAGGACTTTT GGTACAGGTA AAGACTGAAT TCATCGATAT CTAGATCTCG AGCTCGCGAA 1260 
AGCTT ^265 

(2) INFORMATICMJ FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 398 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 
15 10 15 

Ala Ser . Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 
20 25 30 

His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 
35 40 45 

Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly Gly Ser 
50 55 60 
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Asn Ser Ala Ala Ala lie Gly Gin Ser Ser Gly Glu Leu Arg Thr Gly 
65 70 75 80 

Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro Arg Pro 
85 90 95 

Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 
100 105 110 

Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 
115 120 125 

Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu lie Glu 
130 135 140 

Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gin Asn Pro Asn 
145 150 155 160 

Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 
165 170 175 

Lys Val Ala He He He Pro Phe Arg Asn Arg Gin Glu His Leu Lys 
180 185 190 

Tyr Trp Leu lyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin Leu Asp 
195 200 205 

Tyr Gly He Tyr Val He Asn Gin Ala Gly Asp Thr He Phe Asn Arg 
210 215 220 

Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp Tyr Asp 
225 230 235 240 



Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu He Pro Met Asn Asp 
245 250 255 
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His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His He Ser Val Ala 
260 265 270 

Met: Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gin Tyr Phe GlyGly 
275 280 285 

Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr He Asn Gly Phe Pro 
290 295 300 

Asn Asn Tyr Tip Gly Trp Gly Gly Glu Asp Asp Asp He Phe Asn Arg 
305 310 315 320 

Leu Val Phe Arg Gly Met Ser He Ser Arg Pro Asn Ala Val Val Gly 
325 330 335 

Arg Cys Arg Met He Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 
340 345 350 

Pro Gin Arg Phe Asp Arg He Ala His Thr Lys Glu Thr Met Leu Ser 
355 360 365 



Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin Arg Tyr 
370 375 380 

Pro Leu Tyr Thr Gin He Thr Val Asp He Gly Thr Pro Ser 
385 390 395 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1246 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: E. coli DHSalpha 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: pSIA2 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 15.. 1232 

(D) OTHER INFORMATION: /products "full-length 
sialyltransf erase (EC 2.4.99.1) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CTGCAGAATT CAAA ATG ATT CAC ACC AAC CTG AAG AAA AAG TTC AGO TGC 50 
Met lie His Thr Asn Leu Lys Lys Lys Phe Ser Cys 
15 10 

TGC GTC CTG GTC TTT CTT CTG TTT GCA GTC ATC TGT GTG TGG AAG GAA 98 
Cys Val Leu Val Phe Leu Leu Phe Ala Val lie Cys Val Trp Lys Glu 
15 20 25 

AAG AAG AAA GGG AGT TAC TAT GAT TCC TTT AAA TTG CAA ACC AAG GAA 146 
Lys Lys Lys Gly Ser Tyr Tyr Asp Ser Phe Lys Leu Gin Thr Lys Glu 
30 35 40 

TTC CAG GTG TTA AAG AGT CTG GGG AAA TTG GCC ATG GGG TCT GAT TCC 194 
Phe Gin Val Leu Lys Ser Leu Gly Lys Leu Ala Met Gly Ser Asp Ser 
45 50 55 60 



CAG TCT GTA TCC TCA AGC AGC ACC CAG GAC CCC CAC AGG GGC CGC CAG 
Gin Ser Val Ser Ser Ser Ser Thr Gin Asp Pro His Arg Gly ArgGln 
65 70 75 



242 
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ACC CTC GGC AGT CTC AGA GGC CTA GCC AAG GCC AAA CCA GAG GCC TCC 
Thr Leu Gly Sex Leu Arg Gly Leu Ala Lys Ala Lys Pro Glu Ala Ser 
80 85 90 



290 



TTC CAG GTG TGG AAC AAG GAC AGC TCT TCC AAA AAC CTT ATC CCTAGG 
Phe Gin Val Trp Asn Lys Asp Ser Ser Ser Lys Asn Leu lie Pro Arg 
. 95 100 105 



338 



CTG CAA AAG ATC TGG AAG AAT TAC CTA AGC ATG AAC AAG TAC AAA GTG 
Leu Gin Lys lie Trp Lys Asn Tyr Leu Ser Met Asn Lys Tyr Lys Val 
110 115 120 



386 



TCC TAC AAG GGG CCA GGA CCA GGC ATC AAG TTC AGT GCA GAG GCC CTG 
Ser Tyr Lys Gly Pro Gly Pro Gly lie Lys Phe Ser Ala Glu- Ala Leu 
125 130 135 140 



434 



CGC TGC CAC CTC CGG GAC CAT GTG AAT GTA TCC ATG GTA GAG CTC ACA 
Arg Cys His Leu Arg Asp His Val Asn Val Ser Met Val Glu Val Thr 
145 150 155 



482 



GAT TTT CCC TTC AAT ACC TCT GAA TGG GAG GGT TAT CTG CCC AAG GAG 
Asp Phe Pro Phe Asn Thr Ser Glu Trp Glu Gly Tyr Leu Pro Lys Glu 
160 165 170 



530 



AGC ATT AGG ACC AAG GCT GGG CCT TGG GGC AGG TGT GCT GTT GTGTCG 
Ser He Arg Thr Lys Ala Gly Pro Trp Gly Arg Cys Ala Val Val Ser 
175 180 185 



578 



TCA GCG GGA TCT CTG AAG TCC TCC CAA CTA GGC AGA GAA ATC GAT GAT 
Ser Ala Gly Ser Leu Lys Ser Ser Gin Leu Gly Arg Glu He Asp Asp 
190 195 200 



626 



CAT 
His 
205 



GAC GCA 
Asp Ala 



GTC CTG AGG TTT AAT GGG GCA CCC ACA GCC AAC TTC CAA 
Val Leu Arg Phe Asn Gly Ala Pro Thr Ala Asn Phe Gin 
210 215 220 



674 
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CAA GAT GTG GGC ACA AAA ACT ACC ATT CGC CTG ATG AAC TCT CAG TTG 722 
Gin Asp Val Gly Thr Lys Thr Thr lie Arg Leu Met Asn Ser Gin Leu 
225 230 235 

GTT ACC ACA GAG AAG CGC TTC CTC AAA GAC AGT TTG TAC AAT GAAGGA 770 
Val Thr Thr Glu Lys Arg Phe Leu Lys Asp Ser Leu Tyr Asn GluGly 
240 245 250 

ATC CTA ATT GTA TGG GAC CCA TCT GTA TAC CAC TCA GAT ATC CCA AAG 818 
He Leu He Val Trp Asp Pro Ser Val Tyr His Ser Asp He Pro Lys 
255 260 265 

TGG TAC CAG AAT CCG GAT TAT AAT TTC TTT AAC AAC TAC AAG ACT TAT 866 
Trp Tyr Gin Asn Pro Asp Tyr Asn Phe Phe Asn Asn Tyr Lys Thr Tyr 
270 275 280 

CGT AAG CTG CAC CCC AAT CAG CCC TTT TAC ATC CTC AAG CCC CAG ATG 914 
Arg Lys Leu His Pro Asn Gin Pro Phe Tyr He Leu Lys Pro Gin Met 
285 290 295 300 

CCT TGG GAG CTA TGG GAC ATT CTT CAA GAA ATC TCC CCA GAA GAG ATT 962 
Pro Trp Glu Leu Trp Asp He Leu Gin Glu He. Ser Pro Glu Glu He 
305 310 315 

CAG CCA AAC CCC CCA TCC TCT GGG ATG CTT GGT ATC ATC ATC ATG ATG 1010 
Gin Pro Asn Pro Pro Ser Ser Gly Met Leu Gly He He He Met Met 
320 325 330 

ACG CTG TGT GAC CAG GTG GAT ATT TAT GAG TTC CTC CCA TCC AAG CGC 1058 
Thr Leu Cys Asp Gin Val Asp He Tyr Glu Phe Leu Pro Ser Lys Arg 
335 340 345 

AAG ACT GAC GTG TGC TAC TAC TAC CAG AAG TTC TTC GAT AGT GCC TGC 1106 
Lys Thr Asp Val Cys Tyr Tyr Tyr Gin Lys Phe Phe Asp Ser Ala Cys 
350 355 360 



wo 94/12646 



PCT/EP93/03194 



-38- 



ACG ATG GGT GCC TAG CAC CCG CTG CTC TAT GAG AAG AAT TTG GTGAAG 1154 
Thr Met Gly Ala Tyr His Pro Leu Leu Tyr Glu Lys Asn Leu Val Lys 
365 370 375 380 

CAT CTC AAC CAG GGC ACA GAT GAG GAC ATC TAG CTG CTT GGA AAA GCC 1202 
His Leu Asn Gin Gly Thr Asp Glu Asp lie Tyr Leu Leu Gly Lys Ala 
385 390 395 

ACA CTG CCT GGC TTC CGG ACC ATT CAC TGC TAAGCACAGG ATCC 1246 
Thr Leu -Pro Gly Phe Arg Thr lie His Cys 
400 405 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 406 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met lie His Thr Asn Leu Lys Lys Lys Phe Ser Cys Cys Val Leu Val 
15 10 15 

Phe Leu Leu Phe Ala Val lie Cys Val Trp Lys Glu Lys Lys Lys Gly 
20 25 30 

Ser Tyr Tyr Asp Ser Phe Lys Leu Gin Tlir Lys Glu Phe Gin Val Leu 
35 40 45 

Lys Ser Leu Gly Lys Leu Ala Met Gly Ser Asp Ser Gin Ser Val Ser 
50 55 60 
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Ser Ser Ser Thr Gin Asp Pro His Arg Gly Arg Gin Thr Leu Gly Ser 
65 70 75 80 

Leu Arg Gly Leu Ala Lys Ala Lys Pro Glu Ala Ser Phe Gin Val Trp 
85 90 95 



Asn Lys Asp Ser Ser Ser Lys Asn Leu lie Pro Arg Leu Gin Lys lie 
100 105 110 

Trp Lys Asn Tyr Leu Ser Met Asn Lys Tyr Lys Val Ser Tyr Lys Gly 
115 120 125 

Pro Gly Pro Gly lie Lys Phe Ser Ala Glu Ala Leu Arg Cys His Leu 
130 135 140 



Arg Asp His Val Asn Val Ser Met Val Glu Val Thr Asp Phe Pro Phe 
145 150 155 160 

Asn Thr Ser Glu Trp Glu Gly Tyr Leu Pro Lys Glu Ser lie Arg Thr 
165 170 175 

Lys Ala Gly Pro Trp Gly Arg Cys Ala Val Val Ser Ser Ala Gly Ser 
180 185 190 

Leu Lys Ser Ser Gin Leu Gly Arg Glu lie Asp Asp His Asp Ala Val 
195 200 205 

Leu Arg Phe Asn Gly Ala Pro Thr Ala Asn Phe Gin Gin Asp Val Gly 
210 215 220 

Thr Lys Thr Thr lie Arg Leu Met Asn Ser Gin Leu Val Thr Thr Glu 
225 230 235 240 



Lys Arg Phe Leu Lys Asp Ser Leu Tyr Asn Glu Gly lie Leu lie Val 
245 250 255 
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Trp Asp Pro Ser Val Tyr His Ser Asp He Pro Lys Trp Tyr GlnAsn 
260 265 270 

Pro Asp Tyr Asn Phe Phe Asn Asn Tyr Lys Thr Tyr Arg Lys Leu His 
275 280 285 

Pro Asn Gin Pro Phe Tyr He Leu Lys Pro Gin Met Pro Trp GluLeu 
290 295 300 

Trp Asp He Leu Gin Glu He Ser Pro Glu Glu He Gin Pro Asn Pro 
305 310 - 315 320 

Pro Ser Ser Gly Met Leu Gly He He He Met Met Thr Leu Cys Asp 
325 330 335 

Gin Val Asp He Tyr Glu Phe Leu Pro Ser Lys Arg Lys Thr Asp Val 
340 345 350 

Cys Tyr Tyr Tyr Gin Lys Phe Phe Asp Ser Ala Cys Thr Met Gly Ala 
355 360 365 

Tyr His Pro Leu Leu Tyr Glu Lys Asn Leu Val Lys His Leu Asn Gin 
370 375 380 

Gly Thr Asp Glu Asp He Tyr Leu Leu Gly Lys Ala Thr Leu Pro Gly 
385 390 395 400 

Phe Arg Thr He His Cys 
405 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2304 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: E, coli DHSalpha 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: YEPGSTa 

~ (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..2301 

. (D) OTHER INFORMATION: /product = 

•galactosyltransf erase-sialyltransferase hybrid 
protein" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATG AGG CTT CGG GAG CCG CTC CTG AGC GGC AGC GCC 6CG ATG CCA GGC 48 
Met Arg Leu Arg Glu. Pro Leu Leu Ser Gly Ser Ala Ala Met ProGly 
15 10 15 

GCG TCC CTA CAG CGG GCC TGC CGC CTG CTC GTG GCC GTC TGC GCTCTG 96 
Ala Ser Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 
20 25 30 

CAC CTT GGC GTC ACC CTC GTT TAC TAC CTG GCT GGC CGC GAC CTG AGC 144 
His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 
35 40 45 

CGC CTG CCC CAA CTG GTC GGA GTC TCC ACA CCG CTG CAG GGC 
Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly 
50 55 60 



GGC TCG 192 
Gly Ser 
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AAC AGT GCC GCC GCC ATC GGG CAG TCC TCC GGG GAG CTC CGG ACC GGA 
Asn Ser Ala Ala Ala He Gly Gin Ser Ser Gly Glu Leu Arg ThrGly 
65 70 75 80 



240 



GGG GCC CGG CCG CCG CCT OCT CTA GGC GCC TCC TCC CAG CCG CGC CCG 
Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro Arg Pro 
85 90 95 



288 



GGT GGC GAC TCC AGC CCA GTC GTG GAT TCT GGC CCT GGC CCC GCTAGC 
Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 
100 105 110 



336 



AAC TTG ACC TCG GTC CCA GTG CCC CAC ACC ACC GCA CTG TCG CTG CCC 
Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 
115 120 125 



384 



GCC TGC CCT GAG GAG TCC CCG CTG CTT GTG GGC CCC ATG CTG ATT GAG 
Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly. Pro Met Leu He Glu 
130 135 140 



432 



TTT AAC ATG CCT GTG GAC CTG GAG 
Phe Asn Met Pro Val Asp Leu Glu 
145 150 

GTG AAG ATG GGC GGC CGC TAT GCC 
Val Lys Met Gly Gly Arg Tyr Ala 
165 



CTC GTG GCA AAG CAG AAC CCAAAT 480 
Leu Val Ala Lys Gin Asn Pro Asn 
155 160 

CCC AGG GAC TGC GTC TCT CCT CAC 528 
Pro Arg Asp Cys Val Ser Pro His 
170 175 



AAG GTG GCC ATC ATC ATT CCA TTC CGC AAC CGG CAG GAG CAC CTC AAG 576 
Lys Val Ala He He He Pro Phe Arg Asn Arg Gin Glu His Leu Lys 
180 ' 185 190 

TAC TGG CTA TAT TAT TTG CAC CCA GTC CTG CAG CGC CAG CAG CTG GAC 524 
Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin Leu Asp 
195 200 205 
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TAT GGC ATC TAT GTT ATC AAC GAG GCG GGA GAG ACT ATA TTC AAT CGT 672 
Tyr Gly lie Tyr Val lie Asn Gin Ala Gly Asp Thr lie Phe AsnArg 
210 215 220 

GCT AAG CTC CTC AAT GTT GGC TTT CAA GAA GCC TTG AAG GAC TAT G AC 720 
Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp Tyr Asp 
225 230 235 240 

TAG ACC TGC TTT GTG TTT AGT GAC GTG GAG CTC ATT CCA-ATG AAT GAC 7S8 
Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu lie Pro Met Asn Asp 
245 250 255 

CAT AAT GCG TAG AGG TGT TTT TCA CAG CCA CGG CAC ATT TCC GTTGCA 816 
His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His lie Ser Val Ala 
260 265 270 

ATG GAT AAG TTT GGA TTC AGG CTA CCT TAT GTT CAG TAT TTT GGA GGT 864 
Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gin Tyr Phe Gly Gly 
275 280 285 

GTC TCT GCT CTA AGT AAA CAA CAG TTT CTA ACC ATC AAT GGA TTT CCT 912 
Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr lie Asn Gly Phe Pro 
290 295 300 

AAT AAT TAT TGG GGC TGG GGA GGA GAA GAT GAT GAC ATT TTT AACAGA 960 
Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp lie Phe AsnArg 
305 310 315 320 

TTA GTT TTT AGA GGC ATG TCT ATA TCT CGC CCA AAT GCT GTG GTC GGG 1008 
Leu Val Phe Arg Gly Met Ser lie Ser Arg Pro Asn Ala Val Val Gly 
325 330 335 

AGG TGT CGC ATG ATC CGC CAC TCA AGA GAC AAG AAA AAT GAA CCC AAT 1056 
Arg Cys Arg Met lie Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 
340 345 350 
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CCT CAG AGG TTT GAC CGA ATT GCA CAC ACA AAG GAG ACA ATG CTC TCT 1104 
Pro Gin Arg Phe Asp Arg He Ala His Thr Lys Glu Thr Met LeuSer 
355 360 365 

GAT GGT TTG AAC TCA CTC ACC TAG CAG GTG CTG GAT GTA CAG AGA TAC 1152 
Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin ArgTyr 
370 375 380 

CCA TTG TAT ACC CAA ATC ACA GTG GAC ATC GGG ACA CGA GCT GGGATC 1200 
Pro Leu Tyr Thr Gin He Thr Val Asp lie Gly Thr Arg Ala Gly He 
385 390 395 400 



CGT CGA CCT GCA GAA TTC CAG GTG TTA AAG AGT CTG GGG AAA TTG GCC 1248 
Arg Arg Pro Ala Glu Phe Gin Val Leu Lys Ser Leu Gly Lys Leu Ala 
405 410 415 

ATG GGG TCT GAT TCC CAG TCT GTA TCC TCA AGC AGC ACC CAG GAC CCC 1296 
Met Gly Ser Asp Ser Gin Ser Val Ser Ser Ser Ser Thr Gin Asp Pro 
420 425 430 

CAC AGG GGC CGC CAG ACC CTC GGC AGT CTC AGA GGC CTA GCC AAG GCC 1344 
His Arg Gly Arg Gin Thr Leu Gly Ser Leu Arg Gly Leu Ala Lys Ala 
435 440 445 

AAA CCA GAG GCC TCC TTC CAG GTG TGG AAC AAG GAC AGC TCT TCC AAA 1392 
Lys Pro Glu Ala Ser Phe Gin Val Trp Asn Lys Asp Ser Ser Ser Lys 
450 455 460 

AAC CTT ATC CCT AGG CTG CAA AAG ATC TGG AAG AAT TAC CTA AGC ATG 1440 
Asn Leu He Pro Arg Leu Gin Lys He Trp Lys Asn Tyr Leu Ser Met 
465 470 475 480 



AAC AAG TAC AAA GTG TCC TAC AAG GGG CCA GGA CCA GGC ATC AAG TTC 1488 
Asn Lys Tyr Lys Val Ser Tyr Lys Gly Pro Gly Pro Gly He Lys Phe 
485 490 ' 495 
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AGT GCA GAG GCC CTG CGC TGC CAC CTC CGG GAG CAT GTG AAT GTATCC 1536 
Ser Ala Glu Ala Leu Arg Cys His Leu Arg Asp His Val Asn Val Ser 
500 505 510 

ATG GTA GAG GTC ACA GAT TTT CCC TTC AAT ACC TCT GAA TGG GAGGGT 1584 
Met Val Glu Val Thr Asp Phe Pro Phe Asn Thr Ser Glu Trp GluGly 
515 520 525 

TAT CTG CCC AAG GAG AGC ATT AGG ACC AAG GCT GGG CCT TGG GGC AGG 1632 
Tyr Leu Pro Lys Glu Ser He Arg Thr Lys Ala Gly Pro Trp Gly Arg 
530 535 540 

TGT GCT GTT GTG TCG TCA GCG GGA TCT CTG AAG TCC TCC CAA CTA GGC 1680 
Cys Ala Val Val Ser Ser Ala Gly Ser Leu Lys Ser Ser Gin Leu Gly 
545 550 555 560 

AGA GAA ATC GAT GAT CAT GAC GCA GTC CTG AGG TTT AAT GGG GCA CCC 1728 
Arg Glu He Asp Asp His Asp Ala Val Leu Arg Phe Asn Gly Ala Pro 
565 570 575 

ACA GCC AAC TTC CAA CAA GAT GTG GGC ACA AAA ACT ACC ATT CGC CTG 1776 
Thr Ala Asn Phe Gin Gin Asp Val Gly Thr Lys Thr Thr He Arg Leu 
580 585 590 

ATG AAC TCT CAG TTG GTT ACC ACA GAG AAG CGC TTC CTC AAA GACAGT 1824 
Met Asn Ser Gin Leu Val Thr Thr Glu Lys Arg Phe Leu Lys Asp Ser • 
595 600 605 

TTG TAC AAT GAA GGA ATC CTA ATT GTA TGG GAC CCA TCT GTA TAC CAC 1872 
Leu Tyr Asn Glu Gly He Leu He Val Trp Asp Pro Ser Val Tyr His 
610 615 620 



TCA GAT ATC CCA AAG TGG TAC CAG AAT CCG GAT TAT AAT TTC TTT AAC 1920 
Ser Asp He Pro Lys Trp Tyr Gin Asn Pro Asp Tyr Asn Phe Phe Asn 
625 630 635 640 
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AAC TAC AAG ACT TAT CGT AAG CTG CAC CCC AAT CAG CCC TTT TACATC 1968 
Asn Tyr Lys Thr Tyr Arg Lys Leu His Pro Asn Gin Pro Phe Tyr He 
645 650 655 

CTC AAG CCC CAG ATG CCT TGG GAG CTA TGG GAC ATT CTT CAA GAAATC 2016 
Leu Lys Pro Gin Met Pro Trp Glu Leu Trp Asp He Leu Gin Glu lie 
660 665 670 

TCC CCA GAA GAG ATT CAG CCA AAC CCC CCA TCC TCT GGG ATG CTT GGT 2064 
Ser Pro Glu Glu He Gin Pro Asn Pro Pro Ser Ser Gly Met LeuGly 
675 680 685 

ATC ATC ATC ATG ATG ACG CTG TGT GAC CAG GTG GAT ATT TAT GAG TTG 2112 
lie He He Met Met Ttir Leu Cys Asp Gin Val Asp He Tyr Glu Phe 
690 695 700 

CTC CCA TCC AAG CGC AAG ACT GAC GTG TGC TAC TAC TAC CAG AAG TTC 2160 
Leu Pro Ser Lys Arg Lys Thr Asp Val Cys Tyr Tyr Tyr Gin Lys Phe 
705 710 715 720 

TTC GAT AGT GCC TGC ACG ATG GGT GCC TAC CAC CCG CTG CTC TAT GAG 2208 
Phe Asp Ser Ala Cys Thr Met Gly Ala Tyr His Pro Leu Leu Tyr Glu 
725 730 735 

AAG AAT TTG GTG AAG CAT CTC AAC CAG GGC ACA GAT GAG GAC ATC TAC 2256 
Lys Asn Leu Val Lys His Leu Asn Gin Gly Thr Asp Glu Asp He Tyr 
740 745 750 

CTG CTT GGA AAA GCC ACA CTG CCT GGC TTC CGG ACC ATT CAC TGC 2301 
Leu Leu Gly Lys Ala Thr Leu Pro Gly Phe Arg Thr He His Cys 
755 760 765 



TAA 



2304 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7S7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met ProGly 
1 5 10 15 

Ala Ser Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 
20 25 30 

His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 
35 40 45 

Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly Gly Ser 
50 55 60 

Asn Ser Ala Ala Ala lie Gly Gin Ser Ser Gly Glu Leu Arg Thr Gly 
65 70 75 80 

Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro Arg Pro 

85 90 . 95 

Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 
100 105 110 

Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 
115 120 125 
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Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu IleGlu 
130 135 140 

Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gin Asn ProAsn 
145 150 155 160 

Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 
165 170 175 

Lys Val Ala lie lie lie Pro Phe Arg Asn Arg Gin Glu His Leu Lys 
180 185 190 

Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin Leu Asp 
195 200 205 



Tyr Gly lie Tyr Val lie Asn Gin Ala Gly Asp Thr lie Phe Asn Arg 
210 215 220 

Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp Tyr Asp 
225 230 235 240 

Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu lie Pro Met Asn Asp 
245 250 255 



His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His lie Ser Val Ala 
260 265 270 

Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gin Tyr Phe Gly Gly 
275 280 . 285 

Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr He Asn Gly Phe Pro 
290 295 300 



Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp He Phe Asn Arg 
305 310 315 320 
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Leu Val Phe Arg Gly Met Ser lie Ser Arg Pro Asn Ala Val ValGly 

325 330 335 

Arg Cys Arg Met lie Arg His Ser Arg Asp Lys Lys Asn Glu ProAsn 
340 345 350 



Pro Gin Arg Phe Asp Arg lie Ala His Thr Lys Glu Thr Met Leu Ser 
355 360 365 

Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin ArgTyr 
370 375 380 

Pro Leu Tyr Thr Gin He Thr Val Asp He Gly Thr Arg Ala Gly He 
385 390 395 400 



Arg Arg Pro Ala Glu Phe Gin Val 
405 

Met Gly Ser Asp Ser Gin Ser Val 
420 

His Arg Gly Arg Gin Thr Leu Gly 
435 440 

Lys Pro Glu Ala Ser Phe Gin Val 
450 455 

Asn Leu He Pro Arg Leu Gin Lys 
465 470 



Leu Lys Ser Leu Gly Lys Leu Ala 
410 415 

Ser Ser Ser Ser Thr Gin Asp Pro 
425 430 

Ser Leu Arg Gly Leu Ala Lys Ala 
445 

Trp Asn Lys Asp Ser Ser Ser Lys 
460 

He Trp Lys Asn Tyr Leu Ser Met 
475 480 



Asn Lys Tyr Lys Val Ser Tyr Lys Gly Pro Gly Pro Gly He Lys Phe 
485 490 495 



Ser Ala Glu Ala Leu Arg Cys His Leu Arg Asp His Val Asn Val Ser 
500 505 * 510 
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Met Val Glu Val Thr Asp Phe Pro Phe Asn Thr Ser Glu Trp GluGly 
515 520 525 

Tyr Leu Pro Lys Glu Ser lie Arg Thr Lys Ala Gly Pro Trp GlyArg 
530 535 540 

Cys Ala Val Val Ser Ser Ala Gly Ser Leu Lys Ser Ser Gin Leu Gly 
545 550 555 560 

Arg Glu lie Asp Asp His Asp Ala Val Leu Arg Phe Asn Gly Ala Pro 
565 570 575 

Thr Ala Asn Phe Gin Gin Asp Val Gly Thr Lys Thr Thr lie Arg Leu 
580 585 590 

Met Asn Ser Gin Leu Val Thr Thr Glu Lys Arg Phe Leu Lys Asp Ser 
595 600 605 

Leu Tyr Asn Glu Gly lie Leu lie Val Trp Asp Pro Ser Val Tyr His 
610 615 620 

Ser Asp lie Pro Lys Trp Tyr Gin Asn Pro Asp Tyr Asn Phe Phe Asn 
625 630 635 640 

Asn Tyr Lys Thr Tyr Arg Lys Leu His Pro Asn Gin Pro Phe Tyr lie 
645 650 655 

Leu Lys Pro Gin Met Pro Trp Glu Leu Trp Asp lie Leu Gin Glu lie 
660 665 670 

Ser Pro Glu Glu lie Gin Pro Asn Pro Pro Ser Ser Gly Met Leu Gly 
675 680 685 



lie lie lie Met Met Thr Leu Cys Asp Gin Val Asp He Tyr Glu Phe 
690 695 700 
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Leu Pro Ser Lys Arg Lys Thr Asp Val Cys Tyr Tyr Tyr Gin Lys Phe 
705 710 715 720 

Phe Asp Ser Ala Cys Thr Met Gly Ala Tyr His Pro Leu Leu Tyr Glu 
725 ~ 730 735 

Lys Asn Leu Val Lys His Leu Asn Gin Gly Thr Asp Glu Asp lie Tyr 
740 745 750 

Leu Leu Gly Lys Ala Thr Leu Pro Gly Phe Arg Thr lie His Cys 
755 760 765 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2304 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Vi) ORIGINAL SOURCE: 

(B) STRAIN: E. coli DH5alpha 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: YEPGSTb 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..2301 

(D) OTHER INFORMATION: /product = 

"galactosyltransferase-sialyltransf erase i^rbrid 
protein" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG AGG CTT CGG GAG CCG CTC CTG AGC GGC AGC GCC GCG ATG CCAGGC 48 
Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met ProGly 
15 10 15 

GCG TCC CTA CAG CGG GCC TGC CGC CTG CTC GTG GCC GTC TGC GCTCTG 96 
Ala Ser Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 
20 25 30 

CAC CTT GGC GTC ACC CTC GTT TAC TAG CTG GCT GGC CGC GAC CTG AGC 144 
His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp Leu Ser 
35 40 45 

CGC CTG CCC CAA CTG GTC GGA GTC TCC ACA CCG CTG CAG GGC GGC TCG 192 
Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly Gly Ser 
50 55 60 

AAC AGT GCC GCC GCC ATC GGG CAG TCC TCC GGG GAG CTC CGG ACC GGA 240 
Asn Ser Ala Ala Ala lie Gly Gin Ser Ser Gly Glu Leu Arg Thr Gly 
65 70 75 . 80 

GGG GCC CGG CCG CCG CCT CCT CTA GGC GCC TCC TCC CAG CCG CGC CCG 288 
Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro Arg Pro 
85 - 90 95 

GGT GGC GAC TCC AGC CCA GTC GTG GAT TCT GGC CCT GGC CCC GCT AGC 336 
Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 
100 105 110 

AAC TTG ACC TCG GTC CCA GTG CCC CAC ACC ACC GCA CTG TCG CTG CCC 384 
Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 
115 120 125 
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GCC TGC CCT GAG GAG TCC CCG CTG CTT GTG GGC CCC ATG CTG ATT GAG 
Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu IleGlu 
130 135 140 



432 



TTT AAC ATG CCT GTG GAC CTG GAG CTC GTG GCA AAG CAG AAC CCAAAT 
Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gin Asn ProAsn 
145 150 155 160 



480 



GTG AAG ATG GGC GGC CGC TAT GCC CCC AGG GAC TGC GTC TCT CCTCAC 
Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 
165 170 175 



528 



AAG GTG GCC ATC ATC ATT CCA TTC CGC AAC CGG CAG GAG CAC CTC AAG 
Lys Val Ala lie lie lie Pro Phe Arg Asn Arg Gin Glu His Leu Lys 
180 185 190 



576 



TAC TGG CTA TAT TAT TTG CAC CCA GTC CTG CAG CGC CAG CAG CTG GAC 
Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin Leu Asp 
195 200 205 



624 



TAT GGC ATC TAT GTT ATC . AAC CAG GCG GGA GAC ACT ATA TTC AAT CGT 
Tyr Gly lie Tyr Val lie Asn Gin Ala Gly Asp Thr lie Phe Asn Arg 
210 215 220 



672 



GCT AAG CTC CTC AAT GTT GGC TTT CAA GAA GCC TTG AAG GAC TAT GAC 
Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp Tyr Asp 
225 230 235 240 



720 



TAC ACC TGC TTT GTG TTT AGT GAC GTG GAC CTC ATT CCA ATG AAT GAC 
Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu lie Pro Met Asn Asp 
245 250 255 



768 



CAT AAT GCG TAC AGG TGT TTT TCA CAG CCA CGG CAC ATT TCC GTT GCA 
His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His lie Ser Val Ala 
260 265 270 



816 
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ATG GAT AAG TTT GGA TTC AGC CTA CCT TAT GTT GAG TAT TTT GGA GGT 864 
Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gin Tyr Phe GlyGly 
275 280 285 

GTC TCT GCT CTA AGT AAA CAA CAG TIT CTA ACC ATC AAT GGA TTT CCT 912 
Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr lie Asn Gly Phe Pro 
290 295 300 

AAT AAT TAT TGG GGC TGG GGA GGA GAA GAT GAT GAC ATT TTT AAC AGA 960 
Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp lie Phe AsnArg 
305 310 315 320 

TTA GTT TTT AGA GGC ATG TCT ATA TCT CGC CCA AAT GCT GTG GTC GGG 1008 
Leu Val Phe Arg Gly Met Ser He Ser Arg Pro Asn Ala Val Val Gly 
325 330 335 

AGG TGT CGC ATG ATC CGC CAC TCA AGA GAC AAG AAA AAT GAA CCCAAT 1056 
Arg Cys Arg Met He Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 
340 345 350 

CCT CAG AGG TTT GAC CGA ATT GCA CAC ACA AAG GAG ACA ATG CTC TCT 1104 
Pro Gin Arg Phe Asp Arg He Ala His Thr Lys Glu Thr Met Leu Ser 
355 360 365 

GAT GGT TTG AAC TCA CTC ACC TAG CAG GTG CTG GAT GTA CAG AiSATAC 1152 
Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin Arg Tyr 
370 375 380 

CCA TTG TAT ACC CAA ATC ACA GTG GAC ATC GGG ACA CGA GCT AGGATC 1200 
Pro Leu Tyr Thr Gin He Thr Val Asp He Gly Thr Arg Ala Arg He 
385 390 395 400 

CGT CGA CCT GCA GAA TTC CAG GTG TTA AAG AGT CTG GGG AAA TTG GCC 1248 
Arg Arg Pro Ala Glu Phe Gin Val Leu Lys Ser Leu Gly Lys Leu Ala 
405 410 415 
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ATG GGG TCT GAT TCC CAG TCT GTA TCC TCA AGC AGC ACC CAG GAG CCC 1296 

Met Gly Ser Asp Ser Gin Ser Val Ser Ser Sex Ser Thr Gin Asp Pro 

420 * 425 430 

CAG AGG GGC CGC CAG ACC CTC GGC AGT CTC AGA GGC CTA GCC AAGGCC 1344 
His Arg Gly Arg Gin Thr Leu Gly Ser Leu Arg Gly. Leu Ala LysAla 
435 440 445 

AAA CCA GAG GCC TCC TTC CAG GTG TGG AAC AAG GAC AGC TCT TCC AAA 1392 
Lys Pro Glu Ala Ser Phe Gin Val Trp Asn Lys Asp Ser Ser SerLys 
450 455 460 

AAC CTT ATC CCT AGG CTG CAA AAG ATC TGG AAG AAT TAG CTA AGCATG 1440 
Asn Leu lie Pro Arg Leu Gin Lys lie Trp Lys Asn Tyr Leu Ser Met 
465 470 475 480 

AAC AAG TAC AAA GTG TCC TAG AAG GGG CCA GGA CCA GGC ATC AAG TTC 1488 
Asn' Lys Tyr Lys Val Ser Tyr Lys Gly Pro Gly Pro Gly lie Lys Phe 
485 490 495 

AGT GCA GAG GCC CTG CGC TGC CAC CTC GGG GAC CAT GTG AAT GTA TCC 1536 
Ser Ala Glu Ala Leu Arg Cys His Leu Arg Asp His Val Asn Val Ser 
500 505 510 

ATG GTA GAG GTC ACA GAT TTT CCC TTC AAT ACC TCT GAA TGG GAG GGT 1584 
Met Val Glu Val Thr Asp Phe Pro Phe Asn Thr Ser Glu Trp Glu Gly 
515 520 525 

TAT CTG CCC AAG GAG AGC ATT AGG ACC AAG GCT GGG CCT TGG GGC AGG 1532 
Tyr Leu Pro Lys Glu Ser lie Arg Thr Lys Ala Gly Pro Trp Gly Arg 
530 535 540 

TGT GCT GTT GTG TCG TCA GCG GGA TCT CTG AAG TCC TCC CAA CTA GGC 1680 
Cys Ala Val Val Ser Ser Ala Gly Ser Leu Lys Ser Ser Gin Leu Gly 
545 550 555 560 
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AGA GAA ATC GAT GAT CAT GAG GCA GTC CTG AGG TTT AAT GGG GCA CCC 1728 

Arg Glu He Asp Asp His Asp Ala Val Leu Arg Phe Asn Gly Ala Pro 

565 570 575 

ACA GCC AAC TTC CAA CAA GAT GTG GGC ACA AAA ACT ACC ATT CGC CTG 1776 
Thr Ala Asn Phe Gin Gin Asp Val Gly Thr Lys Thr Thr He Arg Leu 
580 585 590 

ATG AAC TCT.CAG TTG GTT ACC ACA GAG AAG CGC TTC CTC AAA GAC AGT 1824 
Met Asn Ser Gin Leu Val Thr Thr Glu Lys Arg Phe Leu Lys Asp Ser 
595 600 - 505 

TTG TAC AAT GAA GGA ATC CTA ATT GTA TGG GAC CCA TCT GTA TAG CAC 1872 
Leu Tyr Asn Glu Gly He Leu He Val Trp Asp Pro Ser Val lyrHis 
610 615 620 

TCA GAT ATC CCA AAG TGG TAC CAG AAT CCG GAT TAT AAT TTC TTT AAC 1920 
Ser Asp He Pro Lys Trp Tyr Gin Asn Pro Asp Tyr Asn Phe Phe Asn 
625 630 635 640 

AAC TAC AAG ACT TAT CGT AAG CTG CAC CCC AAT CAG CCC TTT TAC ATC 1968 
Asn Tyr Lys Thr Tyr Arg Lys Leu His Pro Asn Gin Pro Phe Tyr He 
645 650 655 

CTC AAG CCC CAG ATG CCT TGG GAG CTA TGG GAC ATT CTT CAA GAA ATC 2016 
Leu Lys Pro Gin Met Pro Trp Glu Leu Trp Asp He Leu Gin Glu He 
660 665 670 

TCC CCA GAA GAG ATT CAG CCA AAC CCC CCA TCC TCT GGG ATG CTT GGT 2064 
Ser Pro Glu Glu He Gin Pro Asn Pro Pro Ser Ser Gly Met Leu Gly 
675 680 685 

ATC ATC ATC ATG ATG ACG CTG TGT GAC CAG GTG GAT ATT TAT GAG TTC 2112 
He He He Met Met Thr Leu Cys Asp Gin Val Asp He Tyr Glu Phe 
690 695 700 
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CTC CCA TCC AAG CGC AAG ACT GAC GTG TGC TAC TAC TAC CAG AAG TTC 2160 
Leu Pro Ser Lys Arg Lys Thr Asp val Cys Tyr Tyr Tyr Gin Lys Phe 
705 710 715 . 720 

TTC GAT AGT GCC TGC ACG ATG GGT GCC TAC CAC CCG CTG CTC TAT GAG 2208 
Phe Asp Ser Ala Cys Thr Met Gly Ala Tyr His Pro Leu Leu Tyr Glu 
725 730 735 

AAG AAT TTG GTG AAG CAT CTC AAC CAG GGC ACA GAT GAG GAC ATC TAC 2256 
Lys Asn Leu Val Lys His Leu Asn Gin Gly Thr Asp Glu Asp lie Tyr 
740 745 750 

CTG CTT GGA AAA GCC ACA CTG CCT GGC TTC CGG ACC ATT CAC TGC - 2301 
Leu Leu Gly Lys Ala Thr Leu Pro Gly Phe Arg Thr lie His Cys 
755 760 765 

TAA 2304 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 767 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Arg Leu Arg Glu Pro Leu Leu Ser Gly Ser Ala Ala Met Pro Gly 
1 5 10' 15 



Ala Ser Leu Gin Arg Ala Cys Arg Leu Leu Val Ala Val Cys Ala Leu 
20 25 30 
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His Leu Gly Val Thr Leu Val Tyr Tyr Leu Ala Gly Arg Asp LeuSer 
35 40 45 

Arg Leu Pro Gin Leu Val Gly Val Ser Thr Pro Leu Gin Gly Gly Ser 
50 55 60 - 

Asn Ser Ala Ala Ala lie Gly Gin Ser Ser Gly Glu Leu Arg Thr Gly 
65 70 75 80 

Gly Ala Arg Pro Pro Pro Pro Leu Gly Ala Ser Ser Gin Pro Arg Pro 
85 90 95 

Gly Gly Asp Ser Ser Pro Val Val Asp Ser Gly Pro Gly Pro Ala Ser 
100 105 110 

Asn Leu Thr Ser Val Pro Val Pro His Thr Thr Ala Leu Ser Leu Pro 
115 120 125 

Ala Cys Pro Glu Glu Ser Pro Leu Leu Val Gly Pro Met Leu lie Glu 
130 135 140 

Phe Asn Met Pro Val Asp Leu Glu Leu Val Ala Lys Gin Asn Pro Asn 
145 150 155 160 

Val Lys Met Gly Gly Arg Tyr Ala Pro Arg Asp Cys Val Ser Pro His 
165 170 175 

Lys Val Ala lie He lie Pro Phe Arg Asn Arg Gin Glu His Leu Lys 
180 185 190 



Tyr Trp Leu Tyr Tyr Leu His Pro Val Leu Gin Arg Gin Gin Leu Asp 
195 200 205 



Tyr Gly He Tyr Val He Asn Gin Ala Gly Asp Thr He Phe Asn Arg 
210 215 220 
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Ala Lys Leu Leu Asn Val Gly Phe Gin Glu Ala Leu Lys Asp TyrAsp 
225 230 235 240 

Tyr Thr Cys Phe Val Phe Ser Asp Val Asp Leu lie Pro Met Asn Asp 
245 250 255 

His Asn Ala Tyr Arg Cys Phe Ser Gin Pro Arg His lie Ser Val Ala 
260 265 270 

Met Asp Lys Phe Gly Phe Ser Leu Pro Tyr Val Gin Tyr Phe Gly Gly 
275 280 285 

Val Ser Ala Leu Ser Lys Gin Gin Phe Leu Thr He Asn Gly Phe Pro 
290 295 300 

Asn Asn Tyr Trp Gly Trp Gly Gly Glu Asp Asp Asp He Phe Asn Arg 
305 310 315 320 

Leu Val Phe Arg Gly Met Ser He Ser Arg Pro Asn Ala Val Val Gly 
325 330 335 

Arg Cys Arg Met He Arg His Ser Arg Asp Lys Lys Asn Glu Pro Asn 
340 345 350 

Pro Gin Arg Phe Asp Arg He Ala His Thr Lys Glu Thr Met Leu Ser 
355 360 365 

Asp Gly Leu Asn Ser Leu Thr Tyr Gin Val Leu Asp Val Gin Arg Tyr 
370 375 380 

Pro Leu Tyr Thr Gin He Thr Val Asp He Gly Thr Arg Ala Arg He 
385 390 395 400 



Arg Arg Pro Ala Glu Phe Gin Val Leu Lys Ser Leu Gly Lys Leu Ala 
405 410 415 
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Met Gly Ser Asp Ser Gin Ser Val Ser Ser Ser Ser Thr Gin Asp Pro 
420 425 430 

His Arg Gly Arg Gin Thr Leu Gly Ser Leu Arg Gly Leu Ala Lys Ala 
435 440 445 

Lys Pro Glu Ala Ser Phe Gin Val Trp Asn Lys Asp Ser Ser Ser Lys 
450 455 460 

Asn Leu lie Pro Arg Leu Gin Lys lie Trp Lys Asn Tyr Leu Ser Met 
465 470 475 480 

Asn Lys Tyr Lys Val Ser Tyr Lys Gly Pro Gly Pro Gly He Lys Phe 
485 490 495 

Ser Ala Glu Ala Leu Arg Cys His Leu Arg Asp His Val Asn Val Ser 
500 505 510 

Met Val Glu Val Thr Asp Phe Pro Phe Asn Thr Ser Glu Trp Glu Gly 
515 520 525 



Tyr Leu Pro Lys Glu Ser lie Arg Thr Lys Ala Gly Pro Trp Gly Arg 
530 535 540 



Cys Ala Val Val Ser Ser Ala Gly Ser Leu Lys Ser Ser Gin Leu Gly 
545 550 555 560 

Arg Giu He Asp Asp His Asp Ala Val Leu Arg Phe Asn Gly Ala Pro 
565 570 575 

Thr Ala Asn Phe Gin Gin Asp Val Gly Thr Lys Thr Thr He Arg Leu 
580 585 590 

Met Asn Ser Gin Leu Val Thr Thr Glu Lys Arg Phe Leu Lys Asp Ser 
595 600 605 
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Leu Tyr Asn Glu Gly He Leu He Val Trp Asp Pro Ser Val Tyr His 

615 620 

Ser Asp He Pro Lys Trp lyr Gin Asn Pro Asp Tyr Asn Phe PheAsn 

630 S35 640 

Asn lyr Lys Thr Tyr Arg Lys Leu His Pro Asn Gin Pro Phe lyr He 
645 650 655 

Leu Lys Pro Gin Met Pro Trp Glu Leu Trp Asp He Leu Gin Glu He 
660 665 670 

Ser Pro Glu Glu He Gin Pro Asn Pro Pro Ser Ser Gly Met Leu Gly 
675 680 685 

He He He Met Met Thr Leu Cys Asp Gin Val Asp He Tyr Glu Phe 
690 695 700 



Leu Pro Ser Lys Arg Lys Thr Asp Val Cys lyr Tyr Tyr Gin Lys Phe 
705 710 



715 



720 



Phe Asp Ser Ala Cys Thr Met Gly Ala Tyr His Pro Leu Leu Tyr Glu 
725 730 

Lys Asn Leu Val Lys His Leu Asn Gin Gly Thr Asp Glu Asp He Tyr 
. 740 745 750 



Leu Leu Gly Lys Ala Thr Leu Pro Gly Phe Arg Thr He His Cys 
755 760 765 
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Qaims: 

1. A protein having glycosyltransferase activity comprising identical or different catalytically 
active domains of giycosyltransferases. 

2. A protein according to claim 1 which is a hybrid protein. 

3. A protein according to claim 2 comprising a membrane-bound or soluble glycosyltransferase 
linked to a soluble glycosyltransferase. 

4. A protein according to claim 2 comprising a suitable linker consisting of genetically encoded 
amino acids. 

5. A protein according to claim 2 selected from the group consisting of the protein having the 
amino acid sequence depicted in SEQ ID NO. 5 and the protein having the amino acid sequence 
depicted in SEQ E) NO. 7. 

6. A method for preparing a protein according to claim 2 comprising culturing a suitable 
transformed yeast strain under conditions which allow the expression of said protein. 

7. A DNA molecule coding for a protein according to claim 2, 

8. A hybrid vector comprising a DNA molecule according to claim 1. 

9. A transformed yeast strain comprising a hybrid vector according to claim 8. 

10. Use of a protein according to claim 1 for glycosylation. 
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